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Abstract 

Consider N x N hermitian or symmetric random matrices H with independent entries, where the 
fvj ' distribution of the (i, j) matrix element is given by the probability measure Uij with zero expectation and 

l/~) , with variance afj. We assume that the variances satisfy the normalization condition ]TV ajj — 1 for all j 

\^2 • and that there is a positive constant c such that c < Na^ < c _1 . We further assume that the probability 

distributions vy have a uniform subexponential decay. We prove that the Stieltjes transform of the 
r — , empirical eigenvalue distribution of H is given by the Wigner semicircle law uniformly up to the edges 

f*^ ■ of the spectrum with an error of order (Nrj)^ 1 where r\ is the imaginary part of the spectral parameter 

in the Stieltjes transform. There are three corollaries to this strong local semicircle law: (1) Rigidity of 
eigenvalues: If 7j = 7j,jv denotes the classical location of the j'-th eigenvalue under the semicircle law 
ordered in increasing order, then the j-th eigenvalue Xj is close to 7^ in the sense that for some positive 
constants C, c 



>< 



P(3j: |A J - 7j |>(logiV) closlogAr [min( J ,iV-, 7 + l)j _1/3 iV- 2 / 3 ) < C7exp [ - (log iV) c log log N ] 

a 

for N large enough. (2) The proof of Dyson's conjecture [15] which states that the time scale of the 
Dyson Brownian motion to reach local equilibrium is of order TV -1 up to logarithmic corrections. (3) 
The edge universality holds in the sense that the probability distributions of the largest (and the smallest) 
eigenvalues of two generalized Wigner ensembles are the same in the large N limit provided that the 
second moments of the two ensembles are identical. 
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1 Introduction 

Random matrices were introduced by E. Wigncr to model the excitation spectrum of large nuclei. The 
central idea is based on the observation that the eigenvalue gap distribution for a large complicated system 
is universal in the sense that it depends only on the symmetry class of the physical system but not on 
other detailed structures. As a special case of this general belief, the eigenvalue gap distribution of random 
matrices should be independent of the probability distributions of the ensembles and thus is given by the 
classical Gaussian ensembles. Besides the eigenvalue gap distribution, similar predictions hold also for short 
distance correlation functions of the eigenvalues. Since the gap distribution can be expressed in terms of 
correlation functions, mathematical analysis is usually performed on correlation functions. From now on, 
we refer to universality for the fact that the short distance behavior of the eigenvalue correlation functions 
of a random matrix ensemble are the same as those of the Gaussian ensemble of the same symmetry class 
(Gaussian unitary, orthogonal or symplectic ensemble, i.e., GUE, GOE, GSE). 

The universality question can be roughly divided into the bulk universality in the interior of the spectrum 
and the edge universality near the spectral edges. Over the past two decades, spectacular progress on bulk 
and edge universality was made for invariant ensembles, see, e.g., [8, 12, 13, 30] and [2, 10, 11] for a review. 
For non-invariant ensembles with i.i.d. matrix elements (Standard Wigner ensembles) edge universality can 
be proved via the moment method and its various generalizations, see, e.g., [34, 37, 35]. In a striking 
contrast, the only rigorous results for the bulk universality of non- invariant Wigner ensembles were the work 
by Johansson [26] and subsequent improvements [6, 27] on Gaussian divisible Hermitian ensembles, i.e., 
Hermitian ensembles of the form 

H s = H + sV, (1.1) 

where Hq is a Wigner matrix, V is an independent standard GUE matrix and s is a fixed positive constant 
independent of N. The Hermitian assumption is essential since the key formula used in [26] and the earlier 
work [9] is valid only for Hermitian ensembles. 

The bulk universality, however, was expected to hold for general classes of Wigncr matrices, see Mehta's 
book [28], Conjectures 1.2.1 and 1.2.2 on page 7. We will refer to these two conjectures as the Wigner- 
Dyson-Gaudin-Mchta conjecture due to their pioneering work. Until a few years ago this conjecture remained 
unsolved, mainly due to the fact that all existing methods on local eigenvalue statistics depended on explicit 
formulas which were not available for Wigner matrices. In a series of papers [16, 17, 18, 20, 19, 21, 22, 23], we 
developed a new approach to understand local eigenvalue statistics. This approach, in particular, led to the 
first proof [20] of the Wigner-Dyson-Gaudin-Mehta conjecture for Hermitian Wigner matrices with smooth 
distributions for the matrix elements. We now give a brief summary of this approach which motivates the 
current paper. 

The first step was to derive a local semicircle law, a precise estimate of the local eigenvalue density, down 
to energy scales containing around N e eigenvalues. In fact, we also obtain precise bounds on the matrix 
elements of the Green function. The second step is a general approach for the universality of Gaussian 
divisible ensembles by embedding the matrix (1.1) into a stochastic flow of matrices and use that the 
eigenvalues evolve according to a distinguished coupled system of stochastic differential equations, called 
the Dyson Brownian motion [15]. The central idea is to estimate the time to local equilibrium for the 
Dyson Brownian motion with the introduction of a new stochastic flow, the local relaxation flow, which 
locally behaves like a Dyson Brownian motion but has a faster decay to global equilibrium. This approach 
[19, 21] entirely eliminates the usage of explicit formulas and it provides a unified proof for the universality of 
Gaussian divisible ensembles for all symmetry classes. Furthermore, it also gives a conceptual interpretation 
that the origin of the universality is due to the local ergodicity of Dyson Brownian motion. 



More precisely, we will use a slightly different version of (1.1), namely 

H, = e-'^Ho + (1 - e- 1 ) 1 /^ (1.2) 

to ensure that the variance of H t remains independent of t. Denote by Xj the j'-th eigenvalue of the random 
matrix H t , labelled in increasing order, Ai < A2 < . . . < Ajy, and 7^ the classical location of the j-th 
eigenvalue, i.e., jj is defined by 

N I ' g sc {x)dx=j, l<j<N, (1.3) 



where g sc (x) = j-yv^ — x2 )+ is the semicircle law. Our main result on the universality for the Dyson 
Brownian motion states that, roughly speaking, the short distance correlation functions for H t at the time 
t ~ N~ 2a and H t=C o arc identical in weak sense provided that the following main condition holds: 

Assumption III. There exists an a > such that 

1 N 
sup -E^A,-^) 2 ^™- 1 - 20 (1.4) 

t>N- 2 ' iV ~[ 

with a constant C uniformly in N . Here E( is the expectation w.r.t. Dyson Brownian motion at the time t. 
The condition (1.4) has been derived from a sufficiently strong version of the local semicircle law. 

Once the universality for the Gaussian divisible ensemble is established, the last step is to approximate all 
matrix ensembles by Gaussian divisible ones. This step can be done via a reverse heat flow argument [20, 21] 
for ensembles with smooth probability distributions or more generally via the Green function comparison 
theorem [22] which compares the distributions of eigenvalues of two ensembles around a fixed energy. The key 
input for the latter approach was to prove a-priori estimates on the matrix elements of the Green function. 
These estimates have been obtained together with the local semicircle law. 

To summarize, our approach to universality consists of the following three main steps: Step 1. Local 
semicircle law. Step 2. Universality for Gaussian divisible ensembles. Step 3. Approximation by Gaussian 
divisible ensembles. Both Step 2 and 3 rely on a strong local semicircle law from the Step 1. 

Shortly after the preprint [20] appeared, another method for the universality was posted by Tao and Vu 
[40]. This method contains similar three ingredients as in [20]; their key result, prior to the Green function 
comparison theorem appeared in [22] , states that the probability distributions of the j-th eigenvalue of two 
ensembles for a fixed label j in the bulk are identical as N — > 00 provided that the first four moments of the 
matrix elements of the two ensembles are identical. This result also implies the universality of the correlation 
functions for Hermitian Wigner ensembles [40] by combining it with the Gaussian divisible results of [26, 6] 
for the Step 2. For symmetric ensembles, it requires the first four moments matching those of GOE. As in our 
approach, a key analytic input for [40] is the local semicircle law established in [17]. The bulk universality 
in the case of symmetric matrices in the generality as stated in Mehta's book [28] (in particular, without the 
assumption to match four moments), was proved in [19, 23]. The key input is to link universality to local 
ergodicity of Dyson Brownian motion, reviewed in the previous paragraphs. 

Due to the fundamental role of the local semicircle law, its error estimates were improved many times 
since its first proof in [17]. Furthermore, it was extended to sample covariance ensembles [21] and generalized 
Wigner ensembles [22] whose matrix elements are allowed to have different but comparable variances. The 
best existing error estimates for local semicircle law of generalized Wigner ensembles, given in [23], are 
already almost optimal in the bulk of the spectrum, but not near the edges. In this paper, we will prove a 



strong local semicircle law, Theorem 2.1, which, up to log N factors, gives optimal error estimates everywhere 
in the spectrum. There are four important consequences of this result: 

1. It implies that Assumption III holds with the right hand side of (1.4) given by N~ 2 (logN) closl ° sN 
for some constant C, i.e., o can be chosen arbitrary close to 1/2. Thus the Dyson Brownian motion 
reaches local equilibrium at t ~ N~ 1+s for arbitrary small 6. Up to the factor N s , this is optimal. 
Since the time to the global equilibrium for the Dyson Brownian motion is order one, we have thus 
established Dyson's conjecture [15] that the Dyson Brownian motion reaches equilibrium in two well- 
separated stages with time scales of order one and TV -1 . As a historical note, we mention that Dyson 
had obtained the two time scales via heuristic physical argument and commented that a rigorous proof 
of his prediction is lacking. Furthermore, the notion of local equilibrium was used by Dyson in a very 
vague sense, see [19] for a more detailed discussion. 

2. It implies certain explicit error estimates for the universality of correlation functions in short scales. 

3. It implies the rigidity of eigenvalues in the sense that 

Vi3j : \Xj-jjl > (logN) cl °z losN \mm(j 1 N - j + l)Y V" 2 / 3 I < Cexp [ - (log7V) clogl ° s7V ] 

(1.5) 
for some positive constants C and c. In other words, the eigenvalue is near its classical location with 
an error of at most N~ 1 (\og M) c l ° sl ° s N for generalized Wigner matrices in the bulk and the estimate 

deteriorates by a factor (^) near the edge j <C N. 

4. It implies the edge universality in the sense that the probability distributions of the largest (and the 
smallest) eigenvalues of two generalized Wigner ensembles are equal in the large N limit provided 
that the second moments of the two ensembles are identical. We recall the standard assumption that 
the first moments of the matrix elements are always zero for all generalized Wigner ensembles. The 
comparison between our edge universality theorem and the previous results will be given at the end of 
Section 2 after the statement of Theorem 2.4. 

It is well-known that the gaps between extremal eigenvalues and their fluctuations are of order 7V~ 2 ' 3 . 
Thus the edge deterioration factor in (1.5) is the natural interpolation between iV -1 in the bulk and JV -2 / 3 
on the edges. The surprising feature of the rigidity estimate is that even if one eigenvalue is at a slightly 
wrong location, the probability is already extremely small. We remark that, without the (log N) cl ° gl ° sN 
factor, the rigidity estimate (1.5) would be wrong since, at least for the classical GUE or GOE ensembles, the 
eigenvalues are known to fluctuate on a scale -^/log N/N, see [25, 29]. For these ensembles, the distribution 
of Xj — 7j is Gaussian in the bulk. However, the rigidity estimate (1.5) in this strong probabilistic form was 
not available even for the classical Gaussian ensembles. 

2 Main results 

Let H = {hij)fj =1 be an N X JV hermitian or symmetric matrix where the matrix elements /ly = hji, i < j, 
are independent random variables given by a probability measure i/y with mean zero and variance er 2 , > 0: 



Ehij=0, oi:=E|Ay|'. (2.1) 



The distribution i/ij and its variance of- may depend on N, but we omit this fact in the notation. Denote by 
B := {cE}f 7 - = i the matrix of variances. The following assumptions on B are made throughout the paper: 

(A) For any j fixed 

E4 = i- (2-2) 

1=1 
Thus _B is symmetric and double stochastic and, in particular, it satisfies — 1 < B < 1. 

(B) We assume that there exists two positive constants, S- and 5+, independent of N, such that 

1 is a simple eigenvalue of B and Spec(-B) C [— 1 + S-, 1 — <5 + ] U {1}. (2-3) 

(C) There is a constant Co, independent of N, such that 

max{4} < %-■ (2.4) 

For the orientation of the reader, we mention two special cases that provided the main motivation for 
our work. 

Example 1. Generalized Wigner matrix. Define Ci n f(N) and C sup (N) by 

C mf (N) := inf{iV4} < supper?-} =: C sup (N). (2.5) 

The ensemble is called generalized Wigner ensemble provided that 

< C_ < C inf (N) < C 8UP (N) < C+ < oo, (2.6) 

for some C± independent of N. In this case, one can easily prove that 1 is a simple eigenvalue of B and 
(2.3) holds with some 

5±>C-, (2.7) 

i.e., apart from the trivial eigenvalue, the spectrum of B is separated away ±1 by positive constants that 
are independent of N. The special case Ci n f = C sup = 1 reduces to the standard Wigner matrices. 

Example 2. Certain band matrices with bandwidth of order N . Band matrices are characterized by the 
property that of is a function of \i — j\ on scale W, which is called the bandwidth. More precisely, the 
variances of a band matrix with bandwidth 1 < W < N/2 are given by 



4 = if 



V(t^), (2 .S) 



where / : M. — > K + is a bounded nonnegative symmetric function with J f = 1 and we defined [i — J]n G Z 
by the property that [i — j]n = i — j mod N and — |iV < [i — j]n < ^^V- We often consider the case 
when W = W(N), i.e. the bandwidth is a function of N. The condition (A) holds only asymptotically as 
W(N) — > oo but it can be remedied by an irrelevant rescaling. If the bandwidth is comparable with N, then 
we also have to assume that f(x) is supported in |x| < N/(2W). 



It is easy to see that many band matrices satisfy the spectral assumption (2.3). The lower spectral bound, 
—1 + 6- < B with some <5_ > depending only on /, holds for any sufficiently large W, see Appendix A 
of [21]. The parameter 5+ in the upper spectral bound typically behaves as of order (W/N) 2 . Thus, for 
the condition (B) to hold, we need to assume that the bandwidth is comparable with N, i.e., it satisfies 
W > cN with some positive constant c. The same assumption also guarantees that condition (C) holds. 

We remark that the special case W — N/2 and f(x) > c > for |a;| < 1 was already covered by Example 
1, but Example 2 allows more general band matrices that may have vanishing variances. For example, with 
the choice of f(x) = \ ■ l(|ar| < 1), the ensemble with variances 

<rl = {N/2)-H{[i-j] N <N/A) (2.9) 

is a band matrix with bandwidth W = N/4. 

Define the Green function of H by 

Gij(z) = (jfZ~) ' z = E + i7j, EeR, ry > 0. (2.10) 

The Sticltjes transform of the empirical eigenvalue distribution of H is given by 

m(z) = m N (z) := -^ £ <?#(*) = ^7 Tr tAt. ' ( 2A1 ) 



Define m sc (z) as the unique solution of 



m sc (z) + — — !— - - = 0, (2.12) 

z + m sc [z) 



with positive imaginary part for all z with 3m z > 0, i.e. 



—z + Vz 2 - 4 
m sc (z) = 1 , (2.13) 

where the square root function is chosen with a branch cut in the segment [—2, 2] so that asymptotically 
V ' z 2 — 4 ~ z at infinity. This guarantees that the imaginary part of m sc is non-negative for ?y = 3m z > 
and in the r\ — > limit it is the Wigncr semicircle distribution 

g sc {E) := lim -3m m sc (E + ir,) = — ^{4-E 2 ) + . (2.14) 

The Wigner semicircle law [45] states that mjv(z) — > m sc (z) for any fixed z, i.e., provided that r\ = 3m z > 
is independent of N. Let z = E + ir] (r/ > 0) and denote k := \\E\ — 2| the distance of E to the spectral 
edges ±2. We have proved [23] a local version of this result for generalized Wigner matrices in the form of 
the following probability estimate: 

\m N (z)-m 8C (z)\>^j<^l (2.15) 

that holds for any fixed positive constants e and K and for any z = E + irj such that \E\ < 10, Nr]K 3 ' 2 > N e . 
Note that this estimate deteriorates near the spectral edges as k < 1. 



In this paper we prove the following local semicircle law that provides essentially the optimal estimate 
uniformly in E — d\z z. We will estimate not only the deviation of m(z) from m sc (z), but also the deviation of 
each diagonal matrix element of the resolvent, Gkk(z), from m sc {z). Moreover, we show that the off-diagonal 
elements of the resolvent are small. 

Let 

N _, N 



Vk ■= G k k - m sc , m := — ^2 G kk, [v] ■= — J^ v k 



N ^ ' L ' N 

fc=i fe=i 

Our goal is to estimate the following quantities 

A d := max\vk\ = max|G fcfc - m sc \, A :=max|G M |, A:=\m — m sc \, (2-16) 

k k k-^t 

where the subscripts refer to "diagonal" and "off-diagonal" matrix elements. All these quantities depend on 
the spectral parameter z and on TV but for simplicity we often omit this fact from the notation. 

Theorem 2.1 (Strong local semicircle law) Let H = (hij) be a hermitian or symmetric N x N random 
matrix, N > 3, with E/iy = 0, 1 < i,j < N, and assume that the variances erf satisfy Assumptions (A). 
(B), (C), i.e. (2.2), (2.3) and (2.4). Suppose that the distributions of the matrix elements have a uniformly 
subexponential decay in the sense that there exists a constant t? > 0, independent of N, such that for any 
x > 1 and 1 < i,j < N we have 

F(\hij\ >xa i:j ) <^- 1 exp(-a;' 5 ). (2.17) 

Then there exist positive constants Aq > 1, C,c and <fi < 1 depending only on d, on S± from Assumption 
(B) and on Cq from Assumption (C), such that for all L with 

A„loglogA<L<-£l»- (2.18) 

10 log log N 

the following estimates hold for any sufficiently large N > Nq(&, 6±, Cq): 

(i) The Stieltjes transform of the empirical eigenvalue distribution of H satisfies 

(J {A(z)>M^})<Cexp[- C (log/vH, (2 . 19) 

where 



S :=S L = lz = E + ir) : \E\ < 5, AT-^log 7V) 10L < rj < loj. (2.20) 

(ii) The individual matrix elements of the Green function satisfy that 



p(U lwz) + ^{z)>Q°sirr^?£^ (2-21) 

(Hi) The largest eigenvalue of H is bounded by 2 + A^~ 2 ' 3 (log A^) 9L in the sense that 

Pf max \\A >2 + /V^ 2/3 (log/V) 9L ) < Ccxp [ - c(logiV) 01 '] . (2.22) 

V 7 — 1 N J 



The subexponential decay condition (2.17) can be weakened if we are not aiming at error estimates faster 
than any power law of N. This can be easily carried out and we will not pursue it in this paper. We also 
note that the upper bound on L originates from the natural requirement that S^ =/= 0. 

Prior to our results in [22] and [23], a central limit theorem for the semicircle law on macroscopic scale for 
band matrices was established by Guionnet [24] and Anderson and Zcitouni [3] ; a semicircle law for Gaussian 
band matrices was proved by Disertori, Pinson and Spencer [14]. For a review on band matrices, see the 
recent article [39] by Spencer. 

The local semicircle estimates imply that the empirical counting function of the eigenvalues is close to 
the semicircle counting function and that the locations of the eigenvalues are close to their classical location 
in mean square deviation sense. Recall that Ai < A2 < . . . < \n are the ordered eigenvalues of H . We define 
the normalized empirical counting function by 



1 

N 

Let 



n(E) := -#{A, < E}. (2.23) 



n sc {E) := I g sc (x)dx (2.24) 

be the distribution function of the semicircle law and recall that 7, = 7^ denote the classical location of 
the j'-th point under the semicircle law, see (1.3). 

Theorem 2.2 (Rigidity of Eigenvalues) Suppose that Assumptions (A), (B), (C) and the condition 
(2.17) hold. Then there exist positive constants Aq > 1, C,c and <$> < 1 depending only on •&, on S± from 
Assumption (B) and on Cq from Assumption (C) such that for any L with 

A lo g lo S N<L< l0g(1(W) 



10 log log TV 



we have 

FlBj : \Xj - jj\ > (log N) L min(j,N-j + l) N~ 2/3 I < Cexp [ - c(logAT)* L ] (2.25) 

and 

pj sup \n(E) - n sc (E)\ > ^°\ N) 1 < Ccxp [ - c(log A^] (2.26) 

[\E\<5 A J 

for any sufficiently large N > Nq( , &,S±, Co). 

For standard Wigner matrices, (2.26) with the factor iV _1 replaced by N~ 2 / 5 (in a weaker sense with 
some modifications in the statement) was established in [5] and a stronger N~ 1 ' 2 control was proven for 
En(-E) —n sc (E). If we replaced (log7V) L factor by N s for arbitrary 6 > 0, (2.26) was proved in [23] (Theorem 
6.3) with some deterioration near the spectral edges and with a slightly weaker probability estimate. In 
Theorem 1.3 of a recent preprint [42], the following estimate (in our scaling) 



1/2 



-1/3 



E[|Aj-7j| 2 ]) <[xam(j,N-j + l)]~ N' 1 ^-^ (2.27) 



with some small positive £o was proved under the assumption that the third moment of the matrix element 
vanishes and all variances of the matrix elements are identical, i.e., for the standard Wigner matrices with 
vanishing third moment. In the same paper, it was conjectured that the factor j\f~ 1 / ( '- e o on ^ nc right hand 
side of (2.27) should be replaced by N~ 2/>3+e . Prior to the work [42], the estimate (2.25) away from the 
edges with a slightly weaker probability estimate and with the (\ogN) L factor replaced by N s for arbitrary 
S > was proved in [23] (see the equation before (7.8) in [23]). For Wigner matrices whose matrix element 
distributions matching the standard Gaussian random variable up to the third moment, it was proved in 
[40] that | Aj — 7j| < N~ 1+£ holds in the bulk in probability (Theorem 32). More detailed behavior can 
be obtained if one assumes further that the fourth moment also matches the standard Gaussian random 
variable, see Corollary 21 of [40] for more details. Near the edge, (2.25) with TV -2 / 3 replaced by N~ x / 2 and 
the probability estimate on the right side replaced by a Gaussian type estimate was proved in [1]. 

We remark that all results in this paper are stated for both the hermitian or symmetric case, but the 
statements hold for quaternion self-dual random matrices as well (see, e.g., Section 3.1 of [21]). The proofs 
will be presented for the hermitian case for definiteness but with obvious modifications they are valid for the 
other two cases as well. 

We will frequently use the notation C and c for generic positive constants and iVo for the lower threshold 
for N in this paper. We adopt the convention that, unless stated otherwise, these constants and also the 
implicit constants in the O(-) notation may depend on the basic parameters of our model, namely on $, d± 
and Co . The values of these generic constants may change from line to line. 

2.1 Bulk Universality 

We now use Theorem 2.2 to establish the speed of convergence for local statistics of Dyson Brownian motion. 
In fact, we will replace the Brownian motion in the definition of Dyson Brownian motion by an Ornstein- 
Uhlenbeck process. We thus consider a flow of random matrices H t satisfying the following matrix valued 
stochastic differential equation 

dH t = -Ld/3 t - -H t dt, (2.28) 

JN 2 l ' 



where f3 t is a hermitian matrix valued process whose diagonal matrix elements are standard real Brownian 
motions and the off-diagonal elements are independent standard complex Brownian motions; with all Brow- 
nian motions being independent. The initial condition Hq is the original hermitian Wigner matrix. For any 
fixed t > 0, the distribution of Ht coincides with that of 

e- t/2 J ff + (l-e-*) 1 / 2 F, (2.29) 

where V is an independent GUE matrix whose matrix elements are centered Gaussian random variables with 
variance 1/iV. For the symmetric case, the matrix elements of /3t in (2.28) are real Brownian motions and 
V in (2.29) is a GOE matrix. It is well-known that the eigenvalues of Ht follow a process that is also called 
the Dyson Brownian motion (in our case with a drift but we will still call it Dyson Brownian motion). 
More precisely, let 



e -M(x) 



H = jUw(dx) = — dx, JC(x) = N 



N „2 



1=1 l<j 



(2.30) 



be the probability measure of the eigenvalues of the general j3 ensemble, with j3 > 1 (f3 = 2 for GUE, (3 = 1 
for GOE). Here Zp is the normalization factor so that /i is probability measure. In this section, we often use 



the notation Xj instead of Xj for the eigenvalues to follow the notations of [21]. Denote the distribution of 
the eigenvalues at time t by /t(x)/i(dx). Then f t satisfies 

d t ft=^ft- (2.31) 

where 

N 1 . JL/ R R _ 1 \ 

)di. (2.32) 






For any n > 1 we define the n-point correlation functions (marginals) of the probability measure /td/x by 

PiN( x i> x 2>---> x n)= ft(x)n(x)dx n+ i...dx N . (2.33) 



With a slight abuse of notations, we will sometimes also use fi to denote the density of the measure (i with 
respect to the Lebesgue measure. The correlation functions of the equilibrium measure are denoted by 

Pu N (%uX2,-.-,x n ) = / n(x)dx n+1 . . . dx N . (2.34) 

JR N ~" 

The main result in [21] concerning Dyson Brownian motion, Theorem 2.1, states that the local ergodicity 
of Dyson Brownian motion holds for t > N~ 2a+S for any <5 > provided that the Assumption III (1.4) holds. 
In fact, the estimate on the relaxation to the local equilibrium [21] is not restricted to Dyson Brownian motion; 
it applies to all flows satisfying four general assumptions, labelled as Assumption I- IV in [21]. Instead of 
repeating these assumptions in their general forms, we will give only simple sufficient conditions. Assumption 
I requires that the probability density of the global equilibrium measure is given by a Hamiltonian of the 
form 

N 

K = K N (x)=p[j2u(x j )-^Yl lo z\ Xi - x 4' (2 ' 35) 

J=l i<3 

where j3 > 1 and the function U : R — > R is smooth with U" > S for some positive S. This is clearly satisfied 
since the equilibrium measures are either GUE or GOE in the setting of this paper. Assumption II requires a 
limiting continuous density for the eigenvalue distribution. In our case, the density is given by the semicircle 
law. Assumption IV asserts that the local density of eigenvalues is bounded down to scale n = N~ 1+a 
for any a > 0. This assumption follows from the large deviation estimate (2.19) since a bound on A(z), 
z = E + irj, can be easily used to prove an upper bound on the local density of eigenvalues in a window of 
size r\ about E. As usual, the additional condition in [21] on the entropy S^(ft a ) < CN m for to = N~ 2a 
holds due to the regularization property of the Ornstein-Uhlcnbcck process. Thus for a given < e' < 1, 
choosing a = 1/2 — e'/2, A = e' in the second part of Theorem 2.1 in [21] and using (2.25), we have the 
following theorem. 

Theorem 2.3 (Strong local ergodicity of Dyson Brownian motion) Let H be a hermitian or sym- 
metric N x TV random matrix with Khij = and suppose that Assumptions (A), (B), (C) and (2.17) hold 
with parameters S±,Cq and $. Then for any e' > 0, S > 0, c > positive numbers, for any integer n > 1 
and for any compactly supported continuous test function O : R™ — > R there exists a constant C depending 
on all these parameters and on O such that 

i-E+b igv /■ 

dai . . . da n 0(a\ , . . . , a r , 



sup 



E-b 



2t J„ • " v "'e(E)< 



(« - &) (* + s&f • - ■* + sSsO I s on-' [,-w-iw + »-■/>*-«■ 



Ng{E) 7 ' Ng{E)J 
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(2.36) 



holds for any fixed EG [2 — c, 2 + c] and for any b = bpj € (0,c/2) i/iai may depend on N. Here p\ n ^ 

and pi N , (2.33)-(2.34), are the correlation functions of the eigenvalues of the Dyson Brownian motion flow 
(2.29) and those of the equilibrium measure, respectively. 

Besides a weaker version of Theorem 2.3 was proved in [23], a similar result, with no error estimate, 
was obtained in [20] for the hermitian case by using an explicit formula related to Johansson's formula [26] . 
Theorem 2.3, however, contains explicit estimates and is valid for a time range much bigger than the previous 
results. In particular, we mention the following three special cases: 

• If we choose 5 = 1 — 2e' and thus t — N~ e , then we can choose b ~ TV -1 and the universality is valid 
with essentially no averaging in E. 

• If we choose the energy window of size 6 ~ 1 and the time t = N~ 6 , then the error estimate is of order 

• If we choose b ~ 1, then the smallest time scale for which we can prove the universality is t = N~ 1+e . 
This scale, up to the arbitrary small exponent e', is optimal in accordance with the time scale to local 
equilibrium conjectured by Dyson [15]. 

For generalized Wigner matrices with a subexponential decay, i.e. assuming (2.6) in addition to the 
conditions of Theorem 2.3, the universality result with no explicit error estimate holds for any time t > 0. 
More precisely, for any fixed b > we have 



lim sup 

N^O t>0 



f E+bdE ' f A A n( 

/ -XT- / dai ...da n 0(ai,... 

( O) («) \fzpt , ai cV , a n \ 



(2.37) 
= 0. 



This result, with slightly stronger conditions on the distributions of the ensemble, was already proved in [23]. 
Similarly to [23] , the extension of the universality from a small positive time to zero time requires a different 
method, the Green function comparison theorem [22] in our approach. The reasons of universality for zero 
time and time bigger than 1/N are very different: Theorem 2.3 shows that the local correlation functions 
have already reached their equilibrium under the Dyson Brownian motion flow for any time larger than 
1/N. For time smaller than 1/N, in particular the important case t — 0, the universality is valid because 
we can compare the local correlation functions at time t = with the ones generated by the flow at time 
t = N~ e with specially adjusted initial data (see, e.g., the Matching Lemma 3.4 [23]). The same argument 
as in Section 3 of [23] can be used to prove (2.37) from (2.36). In fact, since our new version of the strong 
local ergodicity of Dyson Brownian motion, Theorem 2.3, holds for very short times, the two ensembles to 
be compared are already very close to each other. Furthermore, effective error estimates instead of a limiting 
statement (2.37) can also be obtained and the parameter b may also be chosen ^-dependent. For the case 
that b is TV- independent, the time to local equilibrium as remarked above is N~ 1+e . Hence the condition 
(2.6) can be replaced by the following condition: there are constants c, e > such that 

\{{i,j):N<jI<c}\<N 2 ~z . (2.38) 

Since these extensions require only minor modifications of the current method, we will not pursue these 
directions in this paper. 
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2.2 Edge distribution 

Recall that \pj is the largest eigenvalue of the random matrix. The probability distribution functions of A^v 
for the classical Gaussian ensembles are identified by Tracy and Widom [43, 44] to be 

lim P(iV 2 / 3 (AAr - 2) < s) = F p {s), (2.39) 

N— »oo 

where the function Fp(s) can be computed in terms of Painleve equations and (3 = 1,2,4 corresponds to 
the standard classical ensembles. The distribution of An is believed to be universal and independent of 
the Gaussian structure. The strong local semicircle law, Theorem 2.1, combined with a modification of 
the Green function comparison theorem (Theorem 6.3) implies the following version of universality of the 
extreme eigenvalues. 

Theorem 2.4 (Universality of extreme eigenvalues) Suppose that we have two N x N matrices, H^ v > 
and H^ w ', with matrix elements hij given by the random variables N~ l ' 2 Vij and N~ l ' 2 Wij , respectively, 
with Vij and Wij satisfying the uniform subexponential decay condition (2.17). Let P v and P w denote the 
probability and E v and E w the expectation with respect to these collections of random variables. Suppose that 
Assumptions (A), (B), (C) hold for both ensembles. If the first two moments of v^ and wtj are the same, 
i.e. 

^ v v\jVi3 = EV, ™ij w ij > 0<l + u<2, (2.40) 

then there is an e > and 8 > depending on # in (2.17) such that for any real parameter s (may depend 
on N ) we have 

P V {N 2/3 (\ N ~2) < s -N- £ )-N- 5 <P w (7V 2/3 (Ajv-2) < s ) < P v (JV 2/3 (Aat -2) < s + N- £ ) + N- s (2.41) 

for N > No sufficiently large, where No is independent of s. Analogous result holds for the smallest eigenvalue 
Ai. 

Theorem 2.4 can be extended to finite correlation functions of extreme eigenvalues. For example, we have 
the following extension to (2.41): 



jv ( 



< 



7V 2 / 3 (A W - 2) < ai - N-c, . . . , iV 2 / 3 ^,- - 2) < Sfc+1 - N~^ - N~ s 

(a^Aat - 2) < si, . . .,N 2 / 3 (\ N _ k - 2) < s fe+ i) (2.42) 

< P v (iV 2 / 3 (AAr - 2) < si + N- e , . . . , N 2 ^(\ N ^ k - 2) < s k+1 + N-^ + N~ s 

for all k fixed and N sufficiently large. The proof of (2.42) is similar to that of (2.41) and we will not 
provide details except stating the general form of the Green function comparison theorem (Theorem 6.4) 
needed in this case. We remark that edge universality is usually formulated in terms of joint distributions 
of edge eigenvalues in the form (2.42) with fixed parameters si, s%, . . .. Our result holds uniformly in these 
parameters, i.e., they may depend on N. However, the interesting regime is \sj\ < (log 7V) cloglogAr , otherwise 
the rigidity estimate (2.25) gives a stronger control than (2.42). 

The edge universality for Wigner matrices was first proved via the moment method by Soshnikov [37] (see 
also the earlier work [34]) for Hermitian and orthogonal ensembles with symmetric distributions to ensure 
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that all odd moments vanish. By combining the moment method and Chebyshev polynomials, Sodin proved 
edge universality of band matrices and some special class of sparse matrices [35, 36]. 

The removal of the symmetry assumption was not straightforward. The approach of [35, 36] is restricted 
to ensembles with symmetric distributions. The symmetry assumption was partially removed in [31, 32] 
and significant progress was made in [41] which assumes only that the first three moments of two Wigncr 
ensembles are identical. In other words, the symmetry assumption was replaced by the vanishing third 
moment condition for Wigner matrices. For a special class of ensembles, the Gaussian divisible Hcrmitian 
ensembles, edge universality was proved [27] under the sole condition that the fourth moment is finite, which 



in our scaling means that K\y/Nhij\ is a positive constant. Using this result [27], one can remove the 
vanishing third moment condition in [41] for Hcrmitian Wigner ensembles. 

In comparison with these results, Theorem 2.4 does not imply the edge universality of band matrices or 
sparse matrices [35, 36], but it implies in particular that, for the purpose to identify the distribution of the top 
eigenvalue for a generalized Wigner matrix with the subexponential decay condition, it suffices to consider 
generalized Wigncr ensembles with Gaussian distribution. Since the distributions of the top eigenvalues of 
the Gaussian Wigner ensembles are given by Fp (2.39), Theorem 2.4 implies the edge universality of the 
standard Wigner matrices under the subexponential decay assumption alone. We remark that one can use 
Theorem 2.2 as an input in the approach of [27] to prove that the distributions of the top eigenvalues of 
the generalized hermitian Wigner ensembles with Gaussian distributions are given by F^. Therefore the 
Tracy- Widom distribution also holds for any generalized hermitian Wigner ensemble with subexponential 
decay. But for ensembles in different symmetry classes (e.g., symmetric Wigner ensembles), there is no 
corresponding result to identify the distribution of the top eigenvalue with Fp if the variances are allowed to 
vary. 

Finally, we comment that the subexponential decay assumption in our approach, though can be weakened, 
is far from optimal, see [4, 7, 33, 38] for discussions on optimal moment assumptions. Our approach based 
on the local semicircle law, however, gives both the bulk and edge universality and the symmetry of the 
distribution of matrix elements plays no role. 

3 Apriori bound for the strong local semicircle law 

We first prove a weaker form of Theorem 2.1, and in Section 4 we will use this apriori bound to obtain the 
stronger form as claimed in Theorem 2.1. 

Theorem 3.1 Let H = (/i,,) be a hermitian NxN random matrix, N > 3, withEhij = 0, 1 < i,j < N, and 
that the variances of- satisfy Assumptions (A), (B), (C) and assume the uniform subexponential de- 



assume 



cay (2.17). Then there exist constants < </> < 1, C > 1 andc > 0, depending only on $ from (2.17), S± from 
Assumption (B) and on Co is from Assumption (C) such that for any £ with A/4> < I < C log N/ log log N 
and for any z = E + iij £ Si we have 

P {max \G u (z) - m sc (z)\ > |^^ j < Ccxp [ - c(log A^] (3.1) 



and 



pjmaxIG.^z)! > j^|J} < Cexp [ - c(logiV)^] (3.2) 

for any sufficiently large N > Nq(9,S±,Co). 
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We remark that the probabilistic estimates in Theorem 3.1 are stated for fix z £ Si, but it is easy to 
deduce from them probabilistic statements that hold simultaneously for all z, e.g. 

P ( |J {max|G M (z) - m sc (z)\ > j^-^ } J < Cexp [ - c(logiV)^] . 



This holds true because in the set S^ the Green function and m sc (z) are Lipschitz continuous in z with a 
Lipschitz constant bounded by r\~ 2 <C N 2 ; for example \d z Gij(z)\ < \3mz\~ 2 < N 2 . Consider an 7V~ 10 -net 
in the compact set S^ , i.e., a set of points {zk} C Si such that minfc \z — z^\ < -/V~ 10 for any z £ Si and such 
that the cardinality of {zk} is at most CN 20 . Using that the estimates (3.1)-(3.2) hold simultaneously for 
all points Zk (since these estimates decay faster than any polynomial in N by <p£ > 1), we see that similar 
estimates, with a smaller c, hold simultaneously for any z £ Si. 

We will follow the self consistent perturbation ideas initiated in [22, 23] . We first introduce some notations. 

Definition 3.1 Let T = {k\, /c2, ■ • ■> k t } C {1, 2, . . . , N} be an unordered set of |T| = t elements and let 
H^' be the N — t by N — t minor of H after removing the ki-th (1 < i < t) rows and columns. For T = 0, 
we define H^"' = H. Similarly, we define a"' T ) to be j-th column of H with the ki-th (1 < i < t) elements 
removed. Sometimes, we just use the short notation a J =a^ ; T ' . Note that the £-th entry of a J is &j = htj 
for £ (jL T. For any T C {1,2,..., N} we introduce the following notations: 





:=(H<n-z)-Hi,j), 


M^T 


7 (T) 


:=a* ■ (H™ - z)~V 


- Z^ ^"w a i 




—h- -A- 7 m 

.—n tJ ~ 20-y - ^y . 





(3.3) 
These quantities depend on z, but we mostly neglect this dependence in the notation. 

The following formulas were proved in Lemma 4.2 of [22]. 

Lemma 3.2 (Self-consistent perturbation formulas) Let Tc {1,2,..., N}. For simplicity, we use the 
notation (i T) for {{i\ U T) and (ijT) for ({i,j} U T). Then we have the following identities: 

1. For any i (£ T 

G$P = (kP)-K (3.4) 



G {T) = -G m G iiT) K {ii T) = -G {T) G ilT) K {ii T) (3.5) 



2. For i ^ j and i,j £ T 

(.-■ 

3. For i 7^ j and i, j ^ T 

G^-G^ = G^G^(G^)-\ (3.6) 

^. For any indices i, j and k that are different and i,j, k £ T 
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The following large deviation estimates concerning independent random variables were proved in Ap- 
pendix B of [22]. 

Lemma 3.3 Let a% (1 < i ' < N ) be independent complex random variables with mean zero, variance a 1 and 
having a uniform subexponential decay 



\ai\ > xa) < ft' 1 exp(-x' & ), Vx>l, 



(3.8) 



with some ft > 0. Let Ai, Bij eC (1 < i,j < N). Then there exists a constant < (f> < 1, depending on ft, 
such that for any (> 1 we have 



N 



' s ^a l B ri ai - y^a 2 B. 



>(log7V)^(^|A| 2 ) 1/2 | < exp[-(logJV)*], 



JV 



> 



(logJV)V(£|B 



/ ^aiBijaj 



1/2 



1/2 



< 



exp[-(log7V) 



<Kl 



> (log7V)V(^|%| 2 ) " U cxp [ - (logTV)^] 



(3.9) 
(3.10) 

(3.11) 



m 



for any sufficiently large N > N , where N = N Q (ft) depends on ft. 



The following lemma (Lemma 4.2 from [23]) collects elementary properties of the Stieljes transform of 
the semicircle law. As a technical note, we use the notation / ~ g for two positive functions in some domain 
D if there is a positive universal constant C such that C _1 < f(z)/g(z) < C holds for all z <G D. 

Lemma 3.4 We have for all z with 3m z > that 

\m sc (z)\ = \m sc (z) + z\- 1 <l. (3.12) 

From now on, let z = E + in with \E\ < 5 and < i) < 10 and we set k = \E\ — 2 . TTien we /tawe 

|m ac (z)|~l, |l-m2 c (z)|~ >/£ + !? (3.13) 

and t/ie following two bounds: 

if K > V an d \E\ > 2 



3mm sc (z) 



/K+fj 



(3.14) 



v / k+1? i//c< T) or \E\ < 2. 



D 
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3.1 Self-consistent perturbation equations 

Following [22, 23], we define the following quantities: 



Z, 



OaGii 



o (jijt-rji 



/ , |_ a fc ^fc i a l - ^a* a l- G k l aj 



k,l=ti 



Ai) 



E a ,z£\ 



A" 



(«) 



E„,A' 



W 



A: 



^?. 



(3.15) 
(3.16) 

(3.17) 



where E a ; indicates the expectation with respect to the matrix elements in the i-th column. Using (3.4) 
from Lemma 3.2, we obtain the following system of self-consistent equations for the deviation from m sc of 
the diagonal matrix elements of the resolvent; 



Gj 



- Ei^-T, 



(3.18) 



For the off-diagonal terms, we will use the equation (3.5). All the quantities defined so far depend on the 
spectral parameter z = E + ir], but we will mostly omit this fact from the notation. 

The key quantities A, Ad and A„ (2.16) appearing in Theorem 3.1 will be typically small and we will 
prove in this section that their size is less than (Ayy) -1 / 3 , modulo logarithmic corrections. We thus define 
the exceptional (bad) event 



B = B(z) := JA d (z) + A (z) > (log A)" 2 }. 

We will always work in the complement set B c , i.e., we will have 

A d (z)+A (z)<(logA)- 2 . 
We collect some basic properties of the Green function in the following elementary lemma. 



(3.19) 



(3.20) 



Lemma 3.5 Let T be a subset o/{l, . . . , N}. Then there exists a constant C = Cf depending on Co from 
(2.4) and on |T|, the cardinality off, such that the following hold in B c 



lit) 

T k-k- 



m sc \ <A d + CA 2 
i<|G$|<C 
maxIG^KCAo, 



for all k <£ T, 
for all k <£ T, 



k=£l 



c 



maxlAh^+CA 2 



(3.21) 
(3.22) 
(3.23) 

(3.24) 



for any fixed |T| and for any sufficiently large N. We recall that all quantities depend on the spectral 
parameter z and the estimates are uniform in z = E + ir) as long as \E\ < 5, < r\ < 10. 



1G 



Proof. For T = 0, the estimates (3.21) and (3.23) follow directly from the definitions (2.16). The bound 
(3.22) follows from (3.20) and that |m sc (z)| ~ 1, see (3.12). Finally, (3.24) follows from inserting (3.22), 
(3.23), (2.2) and (2.4) into (3.15). The general case can be proved by induction on |T| and using the formulas 
(3.6) and (3.7) that guarantee that 

\GZ ] -G^\<C*Al (3.25) 

holds for any T' =TU {m}, where C* depends on the constant Cj for the induction hypothesis. In the set 
B c and for sufficiently large N, depending on |T|, the estimate (3.25) together with |m sc (.z)| ~ 1 guarantees 
that the lower bound in (3.22) continues to hold for T'. The other estimates for T' follow from (3.25) directly 

□ 

3.2 Estimate of the exceptional events 

The following lemma is a modification of Lemma 4.5 in [23]. It improves the estimate in the sense that the 
control parameter depends only on A but not on A^ and A (see (2.16) for definitions). Since A, being an 
average quantity, behaves better, this yields a stronger estimate. 

For any £ > we define the key control parameter vp, which is random variable, by 



*(*) := (logW m+^fM . ( 3. 26 ) 



We also define the events 



n h := imax^ \h tj \ > (logiV) £ / lc Vd} U J | ^/i*| > (\ogN)^ w 1 (3.27) 



Q d (z):= (max|^(^)|> ^(z) 
Q (z):=Uax\zff\z)\>^(z) 
and we let 

Q{z) ■.= n h u \(n d {z) u n (z)) n b(z) c ] (3.28) 

be the set of exceptional events. These definitions depend on the parameter £ that we omit from the notation. 
The main reason that 'F emerges as the key controlling parameter can be seen from the following consid- 
eration. In order to estimate the off-diagonal term Gy, we need to bound (3.5) Ky and thus Zij. By the 
large deviation estimate, (3.11), we have 



\Z™\ < C(logN)^ ]T \°* G ™ °&f * C(lo g N)^ ± J2 | G i"f ( 3 - 29 ) 

V k,lfr,3 V k ' l ^hJ 

holds with high probability. Here we have used that af L < Cq/N from (2.4). 
For any normal matrix A, we have 

Y,\A lJ \ 2 = (AA*)u = (\A\% l (3.30) 
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where \A\ 2 := AA* . 

Applying this identity to the Green function G = [H — z] -1 , we obtain the following "Ward identity": 

i « I Aq_z I V 

where u a and A Q are the eigenvectors and eigenvalues of H . The term "Ward identity" comes from quantum 
field theory and it represents an identity derived from a conservation law or symmetry of a system. In our 
case, the symmetry is generated by the global phase multiplication e l6 , but this connection is not important 
for our purpose. 

Applying (3.31) to estimate the last term in (3.29) and neglecting the superscript (ij), we can bound 



i^wi < c(io g ^y jv " i y Gfcfc < c(io g A^y A(z)+ ^ ?7Uz) 

where we have used the definition of A in the last inequality. Notice that the control parameter Vf appears 
naturally in this estimate. Furthermore, it is 3mm sc (z) which appears in the numerator, not m sc (z). This 
is the fundamental reason that we are able to obtain optimal estimate up to the edges of the spectrum. Near 
the edges, 3mm sc (z) is small while |m sc (z)| stays near 1. 

Lemma 3.6 There exist a constant < (j> < 1, depending on ■& (2.17), and universal constants C > 1, 
c > 0, such that for any £ with A/<j) < £ < C log N / log log N and for any z e S^ we have 

P(n(z)) < C cxp [ - c(log Nf e ] , (3.32) 

and we also have the pointwise statement 

(logA0^ 2 A o (z)+max|T,(z)| <V(z) in n(z) c DB(z) c (3.33) 

i 

for any sufficiently large N > N Q (-&) . 

Proof. There exists < <fi < 1, depending on $, such that the following two estimates hold for any 

I > 4/0: 

P{|M > (logAf /U V„|} < Ccxp [- (logiV)^], ViJ 

by (2.17), and 



N i 1 

J2 M ^ (log^)' /10 < Ccxp [ - (log Nf 1 } 
»=i J 



by (2.4) and the large deviation principle for the sum of independent random variables (e.g., (3.9)). Thus 

P (fi h ) < Crap [-c(log #)**], (3.34) 

so we can work on the complement set J7£. Note that 

n c nB c = n c h nn^nn^n b c . (3.35) 
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and 



Fix z € Si and we will prove, possibly with a smaller (f>, that for I > 4/0 we have 

P(fi£ n fi d (z) n B c (z)) < Gexp [ - c(logiV)^] 

Q c h n o (z) n B c (z)) < Gexp [ - c(logiV)*'] , 



(3.36) 
(3.37) 



and this will prove (3.32). 

To prove the diagonal estimate (3.36), we can choose a sufficiently small <f> > (depending on i?) and 
apply the large deviation bound (3.f0) from Lemma 3.3 to obtain that for any fixed i 



|^<(logAf /3 /£|a ?;fe 4% 

V k,l=£i 



(3.38) 



holds with a probability larger than 1 — C exp [ — c(log iV)^] for sufficiently large N. From the Ward identity 
(3.31) and of < C Q /N (by (2.4) and (3.21)), we have 



\o ik G\[ou < — - ) — — 

k,l^i k^i ' 



kk 



Since we are in the set B c , we have A^ + A < (logiV) 2 . Thus from (3.6) and (3.22) we have that 



-,(0 



i(i) 



< 3m G^ < 3mG fcfc + |G$ - G fefe | < 3mG fefe + C\G lk \ 2 < 3mG kk +CAi 
The last term of (3.39) is bounded by 



Cl v^3mG 



A72 Z^i 






jy2 ^ „ 
We have thus proved that for any z £ St 



<C 



N-q 



B c 



(3.39) 



(3.40) 



(3.41) 



\ Zl (z)\ < C(logAW A(z) + A ° (z) + 3mmsc(z) in B«(*)- 

V ATr/ 

holds with a probability larger than 1 — Gexp [ — c(log iV)^] for sufficiently large N. 

Similarly, for the off-diagonal estimate (3.37), for any fixed i ^ j, we have from (3.11) that 



(3.42) 



|^|<C(logJV)' /8 jE k G J"'^ 



<H), 



k,i&,j 



(3.43) 



holds with a probability larger than 1 — Gexp [ — c(logiV)^] for sufficiently large N. Similarly to the proof 
of (3.42) for Zi, we have 



i4 w (*)i < c(\o g Ny/\ A{z)+A ° {z l +3mmsc{z) 



N V 



in B c (z) 



(3.44) 
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holds for any z £ Se with a probability larger than 1 — Cexp [ — c(log-/V)^ £ ] for sufficiently large N. 

Using Lemma 3.5, we have \Gu\ < C and \G- -\ < C in the set B c . From (3.5), we can thus estimate the 
off-diagonal term Gij by 

\GiA = \G u \\Gfj\\K^\ < C (\h tJ \ + \zff } |) , i ? j, in B c . (3.45) 

Hence we have that in the event B c n fii 



Ao . ^ |Gl , s 3!2if^ + c ( ^ f-±3^ ( , 46) 

holds with a probability larger than 1 — Cexp [ — c(log N)^ e ~\ for sufficiently large N. 

Recall that Nr) > (logiV) 1M on the set S e and since £ > 4/0 > 4, we have (logiV)^/ 3 < \/Nrj, thus the A 
term on the right hand side of (3.46) can be absorbed into the left side for sufficiently large TV. Furthermore, 
by (3.14), we have 3mm sc (z) > cq with a universal positive constant c for any z G S{. Thus the first term 
on the right hand side of (3.46) can be bounded by 



C(logAQ^ < hmmM 

VN ~ y ' \ Nr, 



for large enough N, and thus it can be absorbed into the second term. We conclude that 

p{a o < C(\ogN)- 2e/3 V, B c n ^} > 1 - Cexp [ - c(logiV)^]. (3.47) 

Inserting this bound into (3.42) and (3.44), we have proved (3.36) and (3.37). Finally, the estimate (3.33) 
for T and A is a simple consequence of (3.47), the definition (3.17), the bound (3.24), the definition of O^ 
and that fi c n B c c fi^. This completes the proof of Lemma 3.6. rj 

3.3 Analysis of the self-consistent equation 

Now we start using the self-consistent equation (3.18). Since 



J2 a i3 V 3- T i 



<A d +|T,|, 



the bound (3.12) allows us to expand the denominator in (3.18) as long as A^ + |Tj| < |. In this case, using 
(2.12), we obtain the following equation for Vi 



Vi = mt. 



( e 4 v j - T *) + m *c ( E 4 v > - T *) 2 + ° ( E 4 v j - T i) 3 - ( 3 - 48 ) 



Recall that B denotes the N x N matrix of covariances, B = (of). Thus we can rewrite the last equation as 
[(1 - m 2 sc B)v} z = -mfcCi + ro^Bv), - ? t f + o((Bv)< - T " 
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We will first use this equation to estimate Vi — [v], i.e. the deviation of Vi from its average (Lemma 3.8). In 
the second step, we will add up (3.48) for all i and obtain an equation for [v] (Lemma 3.9). Finally, we use 
a dichotomy argument to estimate A = \[v]\ in Lemma 3.10. 

By normalization assumption Y]- of, — 1, the vector e = (1,1,...,1) is the (unique) eigenvector of B 
with eigenvalue 1. We introduce the notation 

q = q(z) := max{<5+, |1 - %\tm 2 sc {z)\}, (3.49) 

and we recall the following elementary lemma that was proven in [23, Lemma 4.8]. 

Lemma 3.7 The matrix I — m 2 sc {z)B is invertible on the subspace orthogonal to e. Let u be a vector which 
is orthogonal to e and let 

w = (I - m 2 sc (z)B)u, 

then 

Clog N ,. 

9v z J 
for some constant C that only depends on 5- in (2.3). rn 

The following lemma estimates the deviation of Vi from its average [v] : 

Lemma 3.8 Suppose that 4 < £ < C log N/ log log N. Fix the spectral parameter z£Sj and we will omit it 
from the notations. Suppose that in some set H it holds that 



Ad * (I^V' (3 - 50) 



then in the set 5 n £l c fl B c we have 



. .. ClogiV / „ (log NY „\ C\ogN , a2 . 
max|^-H| < 2— ( A 2 + fr+ v % y ^ 2 j < ^(A 2 + *) (3.51) 

for some constant C depending only on <5_ and for sufficiently large N. 

Proof. For z € S^, q(z) and 3mm sc (z) are bounded. Combining (3.50) with the definitions of ^f(z), 
Se and with £ > 4, we obtain that Arf(z), A(z) are bounded by C(logiV) -3 / 2 and &(z) is bounded by 
C(\ogN)~ 2 . Thus the expansion (3.48) holds true in the set 3 n tt c D B c , by using (3.33). We can estimate 
the second and third order terms in (3.48) by C(^f + Ad) 2 and we obtain 



m 2 



Y^ VijVj + e h with Si = 0(#) + 0(A 2 d ) in Snffn B c . (3.52) 



Taking the average over i, we have 

(1 - m sc )[v] = ^ $> = °(*) + °( A rf)' 
and thus it follows from (3.52) that 

Vi - [v] = m 2 sc £ <?U v i - M) + °(*) + °( A d)- 
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Applying Lemma 3.7 for m = Vi — [v], we obtain 



max \ Vi - [v] | < ^^ (A 2 + tt) , (3.53) 



C\ogN U2 



hence 

A d <A + ^^l(A 2 - 

With (3.50), this inequality implies 



C log N , . o , 

A d <A+ ^(A 2 + *). (3.54) 

Using (3.54) to bound A^ in (3.53), we have proved the first inequality of (3.51), the second one follows from 
$ < C(logA)~ 2 . This completes the proof of Lemma 3.8. rj 

In this paper we assumed that the positive constants S± are independent of A (see (2.3)), thus q is 
bounded and the condition (3.50) is automatically satisfied in the set B c , see (3.20), and therefore (3.51) 
can be written as 

max \vi - [v] | < C(log A) (A 2 + *) in O c n B c , (3.55) 

in particular, 

A d < A + C*(logA)(A 2 + *) inO c nB c , (3.56) 

with some constant C depending only on S±. 

Lemma 3.9 Suppose that 4 < £ < Clog N/ log log A. Fix the spectral parameter z € Se and we will omit it 
from the notations. Then in the set fl c D B c we have 

(l- m l)lv]=mlM 2 +™l[Z] + 0(-^)+0((\0 g N)y 2 ), (3.57) 

where [Z] := A -1 Xa=i ^i- The implicit constants in the error terms depend only on S± and Co 

Proof. From the choice £ > 4, and from A < (log A) -2 in the set B c , we have 

* < (log A)" 8 . (3.58) 

Moreover, for z G S^, we have 3mm sc (z) > crj with some universal positive constant c (sec Lemma 3.4), we 
also have 

* > 9^L. (3.59) 



By the definition of Tj (3.17), by the estimates (3.24) and (3.33), we have 

Tj = A, + h u - Zi = h u -Zi + 0(A 2 + A" 1 ) = h u -Zi + 0(* 2 ) in fl c n B c . (3.60) 

The size of the last term of (3.48) is less than 0(* 3 + A^) which is bounded by 0(* 2 + A 3 ) using (3.56) and 
(3.58). Thus we have, from (3.33) and (3.48), 



Vi = m 2 . 



(zZ a l v i +Zi- h u + 0(* 2 )) + m 3 sc (j24 v i + °(*)) 2 + °(* 2 + A ') in n ° n BC - ( 3 - 61 ) 



3 3 
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Summing up i and dividing by N, we obtain 



[v] = mlM + ml c [Z] + 0(* 2 + A 3 ) + ^ £ ( £ a 2 ,^ + 0(tt) ) in tt c n B 



(3.62) 



Here we used that in the set fi c nB c C Vt c h , we have iV" 1 ! EiM < (logA^ 10 ^ 1 < * 2 by (3.59). Writing 
Vj = (vj — [v]) + [v], the last term in (3.62) can be estimated using (3.55) 



N 



E E4 v i + °(*) ) = m -M 2 + 0((logiV)*(A 2 + *)) + O(Atf) + 0(* 2 ). 



Collecting the various error terms and using (3.58) and that A < (log AT) 2 in B c , we obtain (3.57) from 



(3.62). This completes the proof of Lemma 3.9. 



□ 



3.4 Dichotomy estimate for A 

Throughout this section we fix the parameter I with 4 < I < Clog N/ log log N. By Lemma 3.9 we have 
that in n c n B c 



,3 r„,i 2 



(1 - m 2 c )H - mi c [v] = 0(*) + 0(A 2 )/logiV, 



(3.63) 



where we have used the simple bound ^> < 1/ log AT and that in the set $7(z) c n B(z) c all Zi, hence [Z] can 
be bounded by VP (see (3.35) and the definition of £ld)- 
We introduce the following notations: 



I -mi 



(logN) 2e 
P : = (jy^i/3 ' with V = 3mz, 



(3.64) 



where a = a(z) and /3 = /3(z) depend on the spectral parameter z. For any z 6 S| we have the bound 
@(z) < (log N)~ 4 , by £ > 4. From Lemma 3.4 it also follows that there is a universal constant K > 1 such 
that 

—VKTv<a(z)<K^jrTT] (3.65) 

K 

for any z€ S{. 

By definition of \1/ = $?(z) (3.26), we have 



* = (log AT)' 
< (log TV)' 



'A + 3mm sc 



A + 3m m 



(Nr)) 1 ' 3 



^ + (log NftNr,)-*' 3 < /?A + a/? + /3 2 , 



(3.66) 



where, in the last step, we have used that a(z) ~ ^/k + 77, see (3.65), and thus 3mm sc (z) < Ca(z) (see 
Lemma 3.4). We conclude from (3.63) and |m sc | ~ 1 that 



1-m 



< 



C*(/3A + a^ + /3 2 )+0(A 2 )/log/V infi c nB c 



(3.67) 
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with some constant C* . 

Neglecting the error term and replacing [v] by A, we roughly have the equation 



aA-A' 



<C*(/3A + a(3 + f3 2 ). (3.68) 



This inequality provides certain estimates on A depending on whether a < /3 or not. 

Since a and (3 are functions of z (f3(z) depends only on 77 = 3m 2), we will fix E = 9\e z and vary 77 = 3m z 
from 77 = 10 down to 77 = (log N) 10t /N. Thanks to (3.65), a(z) is essentially monotone increasing in 77, up 
to universal constants. The function (3(z) is monotonically decreasing. Therefore there exists a threshold 77 
such that for 77 < 77 we have a < j3 and for 77 > rj we have a > (3. To implement precisely the idea of dividing 
the estimate according to the relative size of a and /3, we will need to choose a large but fixed constant U > 1 
depending only on C* . Let rj — rj(U,E) be the solution to y/n + 77 = 2U 2 K/3(z) where k = \\E\ — 2|. Note 
that up to a constant factor, this equation is the same as a(z) = P(z). Since yjn + 77 is increasing while 
(3{z) is decreasing in 77, the solution is unique and one can easily prove that 



V 



< N~ 1/3 (3.69) 



for sufficiently large N, depending on U. The implementation of this idea and precise estimates on A is given 
by the following Lemma: 

Lemma 3.10 [Dichotomy Lemma] Suppose that 4 < £ < C log iV/ log log N. Then there is a constant 
Uq = Uo(S±,Cq) > 1 such that for any U > Uq, there exists a constant C\{U), depending only on U, such 
that for any spectral parameter z€ Sf the following estimates hold 

A(z)<UP(z) or A(z)>-^- if 3m z > rj(U, IHe z) (3.70) 

A(z)<C 1 (U)/3(z) if 3mz<77([/,5Kez) (3.71) 

in the set Q(z) c n B(z) c and for any sufficiently large N > Nq(S±,Cq). 

Proof. We will set Uq = 9(C* + 1) and let U > Uq where C* is the constant appearing in (3.67). 
Depending on the relative size of /3 and a, which is determined by z, we will either express [v] or [v] from 
(3.67). This will correspond to the two cases in Lemma 3.10. Recalling that |[t>]| = A, the last error term in 
(3.67) can be easily absorbed for sufficiently large N and we will get a quadratic inequality for A. 

Case 1: 77 = 3mz > rj(U, E). By the definition of rj, in this case y/n + 77 > 2U 2 K(3(z), i.e., 

a(z) > 2U 2 /3{z) (3.72) 

by (3.65). From the choice of Uq and U > Uo we get that a > (3 and |a > C* f3. Expressing [v] from (3.67) 
and absorbing the C*/3A term into the left hand side, we obtain 

]-aA < 2A 2 + 2C*a(3. (3.73) 



Thus either 



-aA < 2A 2 , 
4 ~ 
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i.e. A > a/8 which is larger than a/U, or 

\ak < 2C*a(3, 
4 - 

i.e. A < 8C*f3 < U/3, which proves (3.70). 

Case 2: rj = 3m z < rj(U,E). In this case y/iz + rj < 2U 2 K/3(z), i.e., a(z) < 2U 2 K 2 (3(z). We express [vf 
from (3.67) and we get 

A 2 < 2aA + 2C* [ j3A + (3a + (3 2 ] < C'f3A + C'f3 2 (3.74) 

with a constant C" depending on U. This quadratic inequality immediately implies that A < Ci(U)/3 with 
some [/-dependent constant C\(U). Hence we have proved Lemma 3.10. 

D 

3.5 Initial estimates for large rj 

In this section we show that Theorem 3.1 holds for r\ = 3m z = 10, i.e. on the upper boundary of S^. This 
will serve as an initial step for the continuity argument. The proof for r\ = 10 is similar to the arguments 
in Sections 3.2 and 3.3 but much easier. In particular, no apriori assumption similar to (3.20) or no bad set 
B are necessary. We start with the analogue of Lemma 3.6 which actually holds uniformly for any z with 
< rj = 3m z < 10 and not only for zgSj. Note that these estimates are very weak for small 77, but we will 
use them only for 77 = 10. 

Lemma 3.11 For any z £ C with < rj = 3m z < 10, define the exceptional events 

9 rf (z):=(max|Z,(z)|> ( 



w«> w ,>0ogtf) 



&3 J \/Njj 

e(z):=o, l ue d (z)ue (z), (3.75) 

where we recall the definition of £\ in (3.27). Then there exists constants < <j) < 1, C > 1, c > 0, 
depending on $ (2.17), such that for any £ with &/<f> < £ < C log N/ log log N and for any z € Sg we have 

F(6(z)) < Ccxp [ - c(log N)^] , (3.76) 

and the pointwise bound 

max|T 4 (z)| < CW~ 1/: V 3 *n B(z) c (3.77) 

i 

for sufficiently large N > Nq($, Cq). Furthermore, for rj > 3 we have the estimate 

A d (z) + A (z) < CN~ 1/3 in Q(z) c . (3.78) 

for sufficiently large N > Nq{"&, Co)- 
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Proof. Given the estimate (3.34), for the proof of (3.76) it is sufficient to estimate the probability of <dd 
and 9 D . The estimate (3.39) still holds, but we can now bound the last term in (3.39) simply by 

2^fikG kl a u <jj2^<r ik ^^3> ( 3 - 79 ) 

for any z, using the trivial deterministic estimate 

IGgPl^fT 1 (3-80) 

that holds for any i, j and for any T. Combining (3.79) with the large deviation bound (3.10) from Lemma 
3.3 as in (3.38), we obtain P(0<j) < Ccxp [ — c(log AT)"^] . The same argument holds for the exceptional set 
O involving the off-diagonal elements and this proves (3.76). 

From (3.5) and the trivial estimate (3.80), we can estimate the off-diagonal term Gy in the set Q(z) c by 



Nr) 2 \fNrf 



\Gii\ = I^UGgH^I < if 2 (\fHj\ + \Z^\) < (logNY 
for sufficiently large N. Moreover, the same argument gives 

lg| = | G J)||^)|< Ar -l/3 7? - 2) .^^ 

which can be inserted in the definition of A, (3.15), and with Nrj ^> 1, we get 

i t £>n 1 2 



<j V -V3 t? -3 j i^^ ( 3 . 81 ) 



Nr] iW3,j3 - ni/3^3 

for sufficiently large N. In the set C a similar bound holds for h u and Zi using r\ < 10. Recalling that 
Tj = Ai + ha — Zi, and this proves (3.77). 

For the proof of (3.78) it is sufficient to bound only A^, the necessary estimate for A is given in (3.81). 
We define T = max.; |Tj| and note that for r] > 3 we have T < CJV -1 ' 3 in the set C by (3.77). From the 
self consistent equation (3.18) and the defining equation (2.12) of m sc , we have 

vn = 7 S g "r + ( ^L v Kn<N. (3.82) 

Using \Gu\ < r\~ x from (3.80) and |m, sc (z)| = I J q sc {x)/{x — z)dx\ < ?y _1 , we obtain for ry > 3 that 

A d = max |vj| < 2/r) < 2/3. (3.83) 

i 

By (3.12), we have \z + rn sc (z)\ — |m sc (z)| _1 > 3. Together with (3.83), we obtain from (3.82) that 

K\ < | - 7? W — n + 0(T) - (3 - 84) 

\z + m sc [z)\ - maxj \Vi\ 

Maximizing over n, we have 

A d = maxK|< . Ad . t-+0(T). (3.85) 

n \z + m sc \-Ad 

Since the denominator satisfies \z + m sc (z)\ — A^ > 3 — 2/3 = 7/3 by A^ < 2/3, the estimate (3.78) follows 
from (3.85) and (3.77). This completes the proof of Lemma 3.11. rj 
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3.6 Continuity argument : conclusion of the proof of Theorem 3.1 

Fix an energy E with \E\ < 5 and choose a decreasing finite sequence r\ k £ Sg, k = 1,2, ...,ko, with 
k < CTV 8 such that \i ]k - i ]k+1 \ < TV~ 8 and rn = 10, % = TV-^logTV) 11 ". Denote by z k = E + i%. We 
will first show that Theorem 3.1 holds for any z = z k . 

Throughout this section fix any U > Uq from Lemma 3.10 and recall the definition of rj(U,E) from 
before this lemma. Consider first the case of z\. Since r\\ > rj(U,E), see (3.69), we are in the first case 
(3.70) in Lemma 3.10. By Lemma 3.11, we have Ad(zi) + A (zi) < CN' 1 / 3 in the set 8(zi) c , in particular, 
9(zi) c C B(zi) c . Moreover, by A(zi) < CN' 1 ' 3 in the set 6(zi) c , and (3.65), the second alternative of 
(3.70) cannot hold and therefore A( Zl ) < U0(zi) in the set 8(zi) c n n( Zl ) c n B( Zl ) c = 9(zi) c n Vt{ Zl ) c . 
Using the probability estimates (3.32) and (3.76), we have proved that 



A(zi) > U[3( Zl ) 



For a general k we have the following: 



(B(zi)) <Cexp[-c(logJV)*']. 



(3.86) 



Lemma 3.12 There exist constants < (j> < 1, C > 1, c > 0, depending on i}, such that if £ satisfies 
A/(j> < £ < C log iV/ log log TV and (7 is chosen U > Uo(6±,Cq) (see Lemma 3.10) then the following hold for 
any k < fcg and for any sufficiently large TV > No(-d, 5±,Cq, U): Case 1. If rjk > rj{U, E), then 



Hz k ) > U0(z k ) 



< C*'fcexp[-c(logTV)^] and f>(B(z k )) < C'fcexp [ - c(logTV)^] . 



(3.87) 



Case 2. If n k < i](U,E), then 



A(z fe ) > C 1 (U)f3(z k ) 



< C'fccxp [- c(logTV)^] arid P(B(z fc )) < C'fcexp [- c(logTV)^], (3.88) 



where C\{U) is given from Lemma 3.10. 

Proof. We proceed by induction on k, the case k = 1 has been checked in (3.86). First consider Case 1, 
when k < k is such that r] k > rj(U,E), i.e. (3.87) holds by the induction hypothesis. By the definition of 
the sequence z k , we have 

G i:j (z k ) ~ Gij(z k+1 ) < \z k - z k+ i\ sup 



z£S e 



dz 



< TV- 



sup — : 

zeSe \3mz\ 



< N~ 



(3.89) 



for any i,j. Hence |A(zfc) — A(zk+i)\ < TV 6 < hUf3(zk+i) and thus 



A(z k +i) > ^U(3(z k+1 ) 



<C*'fcexp[-c(logTV)^]. 



(3.90) 



In other words, the estimate on A(z k+ i) is deteriorated by a factor 3/2, but it will be gained back by the 
dichotomy estimate in Lemma 3.10. 

Using (3.89) we also have, in tt(z k ) c n B(zfc) c , 

A d (z fe+1 ) + A (z k+1 ) < A d (z k ) + A (z k ) + 2TV" 6 

< (log TV/ (A(z k ) 2 + *(**)) + A(z k ) + 2TV- 6 



< (log TV) 



in 



<UI3(z k ) + 3m m sc (z k ) 



Nr, k 



+ 2Uf3(z k ) + 2TV- 



(3.91) 
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Here in the second line we used the bounds (3.33) and (3.56) that hold on the set Q(zk) c fl B(zfc) c , in the 
last line we used A(zk) < U/3(zk) < (\ogN)~ e . All these estimates hold on an event with probability at least 
1 - C'(k + i)exp [ - c(logiV)^] using (3.32) and the estimate on P(B(z fc )) from (3.87). Here we assumed 
that the constant C is larger than twice the constant C in (3.32). 

By the choice of I > 4 and the definition of /3 from (3.64), the last line of (3.91) is bounded by (log N)~ 2 
and thus we have 

P(B(z fe+1 )) < C'(k + i) cxp [ - cQogN)#] . (3.92) 

Suppose now that k + 1 falls into the first case, r]k+i > rj(U, E), then, from (3.72), 

3 TTR(, \ ^ £(ffc+l) 
-U/J(z k+1 ) < — , 

so by the dichotomy estimate (3.70), A(zk+i) < %U)3(zk+i) from (3.90) implies A(zk+i) < U0(zk+i) on the 
set fl(z k+1 ) c n B(z fe+ i) c . Thus (3.32), (3.90) and (3.92) imply that 



Hzk+i) > U/3(z k+1 



<C"(fc + l)exp[-c(log7V)^] (3.93) 



by using C > 2C where C is the constant from (3.32). This proves (3.87), i.e. the induction step if i]k+i 
is in the first case. If rjk+i falls into the second case, i.e., rjk+i < rj(U,E), then (3.90) gives directly the 
induction step, i.e. (3.88) for k + 1. 

So far we considered Case 1, i.e., we assumed that r]k > rj(U,E). Now consider Case 2, when ?] k < 
rj(U,E) and therefore the induction hypothesis is (3.88). The argument is very similar to the previous 
case but U(3(zk) is replaced with Ci(U)/3(zk) everywhere in (3.90), (3.91) and we still obtain (3.92). Since 
Vk+i < Vk < v(U,E), we can directly refer to (3.71) to obtain the induction step, i.e. (3.88) for k+ 1. This 
completes the proof of Lemma 3.12. m 

Choosing a sufficiently large but fixed U, e.g. U = Uq(8±, Co), we have thus proved that A(zk) < Cj3(zk) 
for all k < ko with a constant depending on S± and Co, in particular ^f(zk) < Cj3{zk) by the definition of \l/ 
(3.26). Using (3.33) and (3.56) we have proved Theorem 3.1 for all Zk, k < ko and any fixed energy E with 
\E\ < 5. For any z = E + ir/ G Sf there is a z k = E + ir/k with \z — z k \ < A^ -8 . Using the Lipschitz continuity 
of Gij(z) and m sc (z) with Lipschitz constant at most N 2 , we easily conclude the proof of Theorem 3.1 for 
any z £ Sg. Note that in order to accommodate the higher (log 7V)-power in j3 and the additional logarithmic 
factors in (3.33) and (3.56) with the final formulation of the result in Theorem 3.1, we needed to redefine 
£ —> £/3 which results in a decreased 4> in the final statement. q 

4 Optimal error bound in the strong local semicircle law 

We have proved Theorem 3.1 which is weaker than the main result Theorem 2.1 but it will be used as an 
apriori bound for the improvement. The key ingredient for the stronger result is the following lemma which 
shows that [Z], the average of Z^s, is much smaller than the size of typical Zi. (Notice that in the proof of 
Theorem 3.1, [Z] was estimated in (3.63) by the same quantity, "P, as each individual Zi.) 
For z G Sg define 

r = T(z) :=n h UB(z), A = A(z):=!1(z)UB(z), (4.1) 
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where f2/j, il were defined in (3.27)-(3.28) and B was given in (3.19). Recall that Q/, and O depend on £ and 
thus r and A also depend on £ but we omit this fact from the notation. We remark that Theorem 3.1 shows 
that there exists a positive constant <fi > such that for any 4/0 < £ < log N/ log log N we have 



P(B(z))=p(A d (z)+A (z)>(logiV)- 2 ) <Ccxp[-c(logiV)*'], z € St, (4.2) 

since the error bar (\ogN) e /(Nn) 1 / 3 in Theorem 3.1 is much smaller than (log N)~ 2 . Combining (4.2) with 
(3.32) and F C A, we get that 

P(r(z)) <P(A(z)) <Cexp[-c(logiV)^], z e S e , (4.3) 

with positive constants C, c depending only on •d in (2.17), S± from Assumption (B) and Co from Assumption 
(C). 

With this notation, and recalling that A (z) = max^ |Gjj(z)|, we then have the following lemma whose 
proof will be given separately in Section 7. 

Lemma 4.1 There exist positive constants D > 1, Aq > 1, and tp < min{l/10, (/)}, depending on $, such 
that for any £ with 

losN 

A \og\ogN<£< f (4.4) 

log log N 

for any p < (\ogNY' e ~ 2 positive even number and for any fixed z £ S( we have 

N 



1 



WvEw 



<(Dp)°PE l(T c (z))[A (z) 2 + N- 



-li p 



(4.5) 



N 
for any sufficiently large N > Nq(Aq,iP). 

The first version of this lemma was presented in Lemma 5.2 of [23] where the p-dependence of the constant 
in (4.5) was not carefully tracked and the effect of the exceptional event T was estimated less precisely. This 
was sufficient since in [23] we applied the result for an exponent p independent of A^; as a consequence, in 
particular, the probability estimates for the local semicircle law were only power law and not subexponential 
in A^ as here. In the current paper we allow p to depend on A^ which requires the more precise form as 
stated in Lemma 4.1. Furthermore, here we give a new proof that relics on a different organization of 
partially independent terms. The main difference is that here we separate dependences on individual matrix 
elements, while in [23] we separated entire rows and columns. The new method is therefore more robust, 
but combinatorially more demanding. 

Recalling the notation 

N 



[Z] = [Z]{z) = ^Y j Z i {z), 



we will apply Lemma 4.1 in the following form: 

Corollary 4.2 There exist positive constants D > 1, Aq > 1, and ip < min{l/10, (j>, 1/D}, depending on •d, 
such that for any £ satisfying (4.4), for any p < (log N)^ £ ~ 2 positive even number and for any fixed z £ Sg 
(2.20) we have for any set S in the probability space 



E 



If 



[Z]{z)\? 



< E 



i(r c n?)$(z) 



2;» 



(Dp) D P[¥(fl(z))+F(E)] 



(4.6) 



where Q(z) is defined in Lemma 3.6. 



2!) 



Proof. On the right hand side of (4.5) we can split the set T c as 

r c =ff ft nB c = [n c n b c n e c ] u [n c n b c n s] u [(Q£ \ n c ) n b c ] 

On the set [il c n B c n S] U [(fi£ \ ft c ) n B c ] C B c , we estimate A trivially by 

A < (log7V)- 2 <l. 
Since $7£ \ 51 c C fl, we have 



(4.7) 



l(r c )|[Z](z)| p < (£>p) D "E l(ft c n B c ) [A (z) 2 + Ar 



-IIP 



(D P ) D P[F(E)+P(fl)]. (4. 



Choosing V < 1/-D we see that (L>p) D < (log A) £ . Thus we can use (logA^A^ 1 < C* 2 (z) for z e S e (by 
3mm sc (z) > C7j) and that (logA^A 2 < * 2 on Q c n B c . see (3.33), to absorb the (Dp) Dp prcfactor in the 
first term in (4.8). This concludes the proof of Corollary 4.2. rj 



Lemma 4.3 Fix two numbers £ and L that satisfy 4 < I < L < °^ i- ' N , in particular Sl C S^, and let 
< t < 1 be an arbitrary constant. For any z = E + ir\ define 



7 = 7(2) := 



(logJV) 



3^+2 



(N V y 



Suppose that for all z € Sl we have 
and 



A(z) < 7(2) 



|[Z](z)|<(logiV) 



I j(z) + 3mm sc (z) 



Nrj 



Suppose that A(z) = o(l) for r\ = 10, \E\ < 5. Then in the set £l c n B c we have 

A(z)<(logiV) 3f+2 (iV7 7 )-( T+1 )/ 2 
for any z e S^. Furthermore, if A(z) < a(z)/2 and (4.11) /ioW /or some z € Sl, t/ien 

n3«+i /^7(^)+3m"z se (z) > 



A(z) <C(logJV)° 



m i/ie set tt c D B c , where a was defined in (3.64). 



a(z)Nr) 



(4.9) 

(4.10) 
(4.11) 

(4.12) 
(4.13) 



Proof: In the first part of the proof z £ Si is fixed so we drop the z-dcpcndcnce of various quantities. 
Recall (3.64), (3.65) and Lemma 3.4 for m sc and a ~ \J k + r\. From Lemma 3.9 and using (4.11), in the set 
il c n B c we have, with w := [v], the estimate 



(1 - m%) 



w — w 2 — O 



in' 



loeN 



O 



(logN) 3 



3e+1 f 'y + 3mm s 
Nr) 



(4.14) 
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where we have used (4.10), the definition of ^> (3.26) and that \w\ = A. We can complete the square of the 
left side and obtain the inequality 

A<2a + C{logN)^(^±^j , (4.15) 

where we have used that 3m m sc < Ca. We claim that in fact 

/ \ 1 / 2 

A<2a + C(logN)^ 1 [ -j- ) (4.16) 

\NrjJ 

also holds; indeed this is trivial if A < 2a, and if A > 2a then by assumption (4.10) 7 > A > 2a, so a can 
be absorbed into 7 in (4.15). 

Define 

/ \ V2 
a = a (z) := TQog/V)^ 1 \JL\ = T(\ogN) 3e+ ^ 2 (N V y^ (4.17) 

with a large parameter T (independent of N) to be specified later, and note that ao < 7 for sufficiently large 
N. 

Suppose that A < a/2. In this case the w 2 terms are smaller than the leading term aw in the left hand 
side of (4.14), therefore we can express \w\ = A and estimate it by 

A < C(logW) « + . (l±*p.) < C(logW) 3«« (■£ nj + ±). (4.18, 

In the second step also used 3mm sc < Ca. In particular, the first inequality proves (4.13). 

Assume now that A < a/2 and a > ao. Plugging the lower bound (4.17) on a into (4.18) and using the 
definition of 7 we obtain 

A < CT' 1 (log TV) ^ (-^1 =CT- 2 a . (4.19) 

\Nr)J 

Choosing T as a sufficiently large constant we obtain that 

A < I (4.20) 

under the condition that A < a/2 and a > ao- Therefore, as long as a > ao, we have a dichotomy: either 
A > a/2 or A < a/4. 

We now fix E and we continuously decrease r\ from 77 = 10 to 77 = A r_1 (logiV) i , the lower point in S^. 
Since A(z) <C 1 and a(z) is bounded away from zero for rj = 10, \E\ < 5, we know that A < a/2 holds 
for rj = 10. Since A(z) is continuous function, by the dichotomy we have that A < a/4 for all 77 as long as 
a > aQ. In particular, A < CT~ 2 ao from (4.19) which proves (4.12) in the case a > aQ. 

Finally, for a < a , we can estimate A directly via (4.16) and this proves that 

/ \ 1/2 
A<C(logN)^i^-j (4.21) 

from which (4.12) follows and we have thus completed the proof. rj 
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Proof of Theorem 2.1. First we explain the idea. We will prove, by an induction on the exponent r, that 
A < (Nrf)~ T holds modulo logarithmic factors with a high probability. Notice that we proved this statement 
for t = 1/3 in Theorem 3.1. Lemma 4.3 asserts that if this statement is true for some r, then it also holds 
for i±^ assuming a bound on [Z]. This bound can be obtained from Corollary 4.2 with a high probability. 
Repeating the induction step for OQoglogN) times, wc will obtain that r is essentially one, i.e. we get 
Theorem 2.1. However, we have to keep track of the increasing logarithmic factors and the deteriorating 
probability estimates of the exceptional sets. 

Throughout the proof we fix L satisfying (2.18) with the constant Aq obtained from Corollary 4.2 and 
wc also fix i\) from the same Corollary. Wc will also use a moving exponent £ whose value will always satisfy 
L/2 < £ < L, in particular Sl C Sf. 

We recall the definition 

(log N) 3£ + 2 



7 = 7(W) {Nr])T 

where we now emphasize the dependence on r and £. Define the events 



Rr,l := (J Rr.i(z), RtAz) ■= (AC?) > 7(Z,T,£) }. 



(4.22) 



(4.23) 



zes L 



Then (3.1) in Theorem 3.1 states that there is a ip with < ip < 1/10 such that for any £q := L we have 

P(Rr,i (> ) < cxp [- (logiV)^], (4.24) 

with t = 1/3 and for any iV > No( r d,S±, Co). Notice that we have used a weaker form of Theorem 3.1 by 
making the threshold 7 larger, the restrictions for £ stronger and reducing the exponent <\> to ip since this 
weaker form will be preserved in the iterative procedure. By setting a sufficiently large lower threshold on 
N, we could remove the constants C,c from (3.1). The general iteration step is included in the following 
lemma. 

Lemma 4.4 There exists a sufficiently large Nq = No( , d,S±,Co) such that for any N > Nq the following 
implication holds. If for some < r < 1 and for some £ with L/2 < £ < L 



W\ 



the 



where 



?(R T ,i) < cxp [- (log N) 
'(iW)<exp[-(logA0^'], 

, T + l 



£' =£ 



3 



Proof. Define 



$ = $(z,t,£) := (log N) e 



lf(z,T,£) + 3mm sc (z) 



Nrj 



Fix z E Sl, then from Corollary 4.2 with the choice of S = R T g we have 



E 



\(Y C )\\Z\Y 



< 



iK,^ 



(Dp)^ p exp -c(logTV) 



\<K 



< $ 2p + (Dp) Dp cxp [ - c(log Nf e 



(4.25) 
(4.26) 
(4.27) 

(4.28) 

(4.29) 
(4.30) 
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where we have used (4.25) and (3.32) to bound the probability of S and H and we used that A < 7 on R% t 
to estimate \1> < 3>. We will choose p = (logiV) a with 

a = i)l- 3. (4.31) 

From Markov's inequality and (4.3) we obtain that 

Ptal > i(logA0$ 2 ) < 2 P (log N)- p $- 2p \ $ 2p + (Dp) Dp exp[- (log Nf e ]} + Cexp[-c(log7V)^] 

< 2 p (logN)- p + exp [Dplog(2Dp) +p(logN) - (logN) a+3 ] + Cexp[-c(log7V)^] 



< exp 



3(log/V) Q . (4.32) 



Here in the second line we used $ > TV -1 / 2 from 3mm sc (z) > cq to estimate $ _2p . In the final estimate 
we used that logp = a log log N < ip£ log log N < iplogN and that ip < <j>. This estimate was for any fixed 
z £ Sl- By choosing a grid of z-values in Sl with spacing of order N~ c , with some large c, we can use the 
Lipschitz continuity of [Z](z) and $(z) to conclude that essentially the same estimate holds simultaneously 
for all z £ Sl- 

Combining this with (4.25), we have 

,2 ^ n^^/T + ^m. 



\[Z]\< (log N)& < (log N) M { ' N ) and A < 7, (4.33) 

for all zeSi with a probability at least 1 — exp [ — 2(log N) a ] . We can now apply Lemma 4.3 so that 

A(z) < (logNf^iNT])-^ 1 ^ 2 (4.34) 

hold for any z€St with a probability bigger than 1 — exp [ — (log N) a ~\ . Here we have used that P(f2 U B) < 
exp [ — 2(log N) a ~\ from (4.3). We have thus proved (4.26) and Lemma 4.4. rj 

Returning to the proof of Theorem 2.1, we choose tq = 1/3 and Iq = L as the initial values of the 
iteration. The input condition (4.25) in Lemma 4.4 for the initial step has been checked in (4.24). Iterating 
Lemma 4.4 yields a sequence of (r„,^„) so that T n+ \ = r' n and £ n +i = t' n vm (4-27), more precisely 

r„ = 1 - 2-" • ^ > 1 - 2-", i n = L- 3n/<0, 
such that 

F ( U {M Z )> { ^^})<^[-(^N)^]. (4.35) 

We run the iteration until n = 2 log log TV so that 

(Nrj) 2 ~ n < N 2 ~" < e. 
If A Q = 20/0, i.e. L > (2O/i/0 log log TV, then £ n > 2L/3 and thus 

P( U {A(*) > e(1 ° g ^ )3L+2 }) < exp [ - (log/V) 2 ^/3] , (4.36) 

This proves (2.19) after renaming 2-0/3 to a new <j>. The proof of (2.21) follows from the estimate on A, from 
(3.33), (3.56) and (4.3). 

Finally, to prove (2.22), we need the following Lemma. 
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Lemma 4.5 Let L > 4 satisfy (2.18) and define the set 

U L --={z = E + ir) : 5>\E\>2 + N- 2/3 (logN) 8L+8 , r, = iY" 2/3 (log Ar) 2L+1 }. (4.37) 

Then for Aq large enough in (2.18), we have 

P( (J {a(z)< (logTV)^ 1 ^)- 1 }) > l-Cexpf-cQogTV)^/ 2 ]. (4.38) 

zeu L 

Proof of Lemma 4-5: For z € Ul we have n = iV _2 / 3 (logiV) 8L+8 > 77 and thus we have (see (3.65)) 

a(z) > Cy/nTrj > cN- 1/3 (logN) 4L+4 . 

Therefore A < a/2 holds on the event A(z) < e ^ ° S jv for any z G Ul. Since XJl C Sl, the probability of 

this event is bigger than 1 — exp [ — (log N) 2 ^ L / 3 ~\ by (4.36). Combining this bound on A with the estimate 
(3.32) for £ — L, we know that 



|[Z](z)|<(logJV) 



2Ll( z ) + 3mm sc (z) 



Nr) 



holds with a probability bigger than 1 — 2exp [— (log A r ) 2 ^ L / 3 ] . Here we used 7(2:) = (logN) 3L+2 (Nrj)^ 1 
with the choice of r = 1 and I = L, see (4.22). 

We can now use (4.13) from Lemma 4.3 with £ = L and r = 1 to have 

/(lo g 7V) 3L + 2 (7V?;)- 1 + ^=\ 
A < C(\ogN) 3 ^ ( { J^ 1 ^ J (4.39) 

with probability larger than 1 — 3cxp [— (log N) 2 ^ L / 3 ~\ . Here we used the probability estimate (4.3) on 
P(Q U B) and the first bound in (3.14). Then using the values of k and r\ in the set (4.37), we obtain 

A< (logN^iNTjy 1 

from (4.39) and this proves Lemma 4.5. rj 

We now prove (2.22). On the set U^ we have 

3mm sc = 0(4=) < (logTV)- 1 ^)" 1 . (4.40) 

Combining it with (4.38), we obtain that 

P ( U { 3mm ( z ) < 2 ( 1 °gA r ) _1 (A r f?)~ 1 }) > l-C*exp[-c(log7V) v ' L/2 ]. (4.41) 

26U t 

Fix z = E + it] € Vl and define the event 

W(z) := {3j : |A, - E\ < ?,}. 
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Recalling the definition of m, 

1 N 
Jmm(,) = -g (£ _ A , )2+??2 , (4-42) 

it is clear that 3mm(z) > ^{Nrf) -1 on the set W(z). Using (4.41) we obtain that 



P 



(3j : 2 + N- 2/3 (logN) 8L+8 < \Xj\ < 5) < Ccxp [ - c(logNf L/2 ] . (4.43) 



Finally, we need to control the probability of a very large eigenvalue. For example, the following (not optimal) 
estimate was proved in, e..g, Lemma 7.2 of [22]. We formulate the results for the largest eigenvalue Xn, but 
analogous results hold for the smallest eigenvalue Ai as well. 

Lemma 4.6 Let H satisfy Assumptions (A), (B), (C) and the subexponential decay condition (2.17). Then 
for some e > 0, depending on $, we have 

P{X N >K)<e- NSlosK (4.44) 

for any K > 3. 

Combining this lemma with (4.43) we completed the proof of (2.22). rj 

5 Estimates on the location of eigenvalues 

Proof of Theorem 2.2. We now translate the information on the Stieltjes transform obtained in Theorem 2.1 
to prove Theorem 2.2 on the location of the eigenvalues. We will need the following Lemma 5.1 which is a 
special case of Lemma 6.1 proved in [23] with the choice A = 0. The conditions (6.1) and (6.2) stated in 
Lemma 6.1 of [23] are not sufficient. Instead, the following slightly stronger assumption is necessary: 

CU 

\m A (x + iy)\< for 1 > y > 0, \x\<K+l, (5.1) 

y(Kx + yr 

i.e., it is not sufficient to control only the imaginary part of m . This stronger condition is needed in (6.7) 
of [23], where the imaginary part of m is changed to its real part after an integration by parts. With the 
condition (5.1), the proof of Lemma 6.1 in [23] remains otherwise unchanged. This immediately proves the 
following lemma as a special case: 

Lemma 5.1 Let g be a signed measure on the real line with supp g C [-K, K] for some fixed constant K . 
For any E\,E% G [— 3,3] and r\ > we define /(A) = fEi,E 2 .ri{X) to be a characteristic function of [Ei,E2\ 
smoothed on scale -q, i.e., f = 1 on [£i,£ 2 ], / = onl\[£i - ?7, #2 + 77] and \f'\ < Crf 1 , \f"\ < Cr/" 2 . 
Let m be the Stieltjes transform of g . Suppose for some positive number U (may depend on N ) we have 

CU 

\m A (x + iy)\<— for 1 > y > 0, \x\ + y < K. (5.2) 



Then 

fE u E 2 ,r,(X)g A (X)d\ 



< « 

with some constant C depending on K . r- 1 
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We will apply this lemma with the choice that the signed measure is the difference of the empirical density 
and the semicircle law, 

g A (d\) = g(dX) - g sc (X)d\, Q (dX) := 1 ]T 5(\ t - A). 

i 

First we prove (2.26). Choose L :— Ao log log N, where A$ is given in Theorem 2.1, and we define 

T N := (logiV) L = (logiV)^ 010810 ^ 

,A 



\m{z) - m sc (z)\ < \m(z ) - m sc (z )\ + / \d v (m(x + irj) - m sc (x + irj)) \drj 



y 



for simplicity. By Theorem 2.1, the assumptions of Lemma 5.1 hold for the difference m = m — m sc with 
K = 10 and U = T^ if y > yo := T^P/iV. For y < yo, set z = x + iy, zq = x + iyo and estimate 

r-3/o 

(5.4) 

(5.5) 
(5.6) 



Note that 



\d v m(x + irj)\ =|— ^ d v G j:j (x + irf) 



<^J2\G jk (x + ir])\ 2 = j^y^^xaGjjix + irf) = -3mm(x + irf), 



jk 



and similarly 



\d v m sc (x + iri)\ = 



Qsc(s) 



(s — x — irj) 



-As 



< 



0sc(s) , 1 , , . , 

— r^ds = — Jm m sc [x + irj). 



\s — x — W]\ 



1] 



Now we use the fact that the functions y — > y3mm(x + iy) and y — > y3mm sc (x + iy) are monotone increasing 
for any y > since both are Stieltjes transforms of a positive measure. Therefore the integral in (5.4) can 
be bounded by 



f Vo dri f Vo dn 

/ — [3mm(i + irj) + 3mm sc (x + ir))\ < yo\j3xam(x + iyo) + / 3xam sc (x + iyo)] / — r 

Jy V Jy V 



(5.7) 



By definition, Jmm sc (x + iyo) < \m sc (x + iyo) I < C- By the choice of yo and Theorem 2.1, we have 



3m m(x + iyo) < 3mm sc (a; + iyo) + 



rp4 

N <C 



Ny 



(5.8) 



with very high probability. Together with (5.7) and (5.4), this proves that (5.2) holds for y < y as well if 
U is increased to U = T^. 

The application of Lemma 5.1 shows that for any rj > 1/N 



fE 1 ,E 2 ,T,Wg(X)d\ - / /B 1 ,E 2 , J7 (A)fo c (A)dA 



< 



c(io g iv)rff 

N 



(5.9) 



With the fact: y — > yJmmjx + iy) is monotone increasing for any y > 0, (5.8) implies a crude upper bound 
on the empirical density. Indeed, for any interval I :— [x — r), x + rj\, with rj = 1/N, we have 



n(x + rj) — n(x — rj) < Cr\ 3m m(x + irj) < Cyo 3m m(x + iyo) 



< 



CT, 



10 



N 



(5.10) 
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This bound can be used to estimate the difference between the characteristic function of the interval [E\ , E2] 
and the smoothed function fE lt E 2 ,<n- 

Since the probability to have eigenvalues outside the interval [—3,3] are extremely small, we consider 
only the case that all eigenvalues are inside [—3,3]. Let E\ = —4 and E2 := E € [—3,3]. Then from (5.9) 
and (5.10) we have that 



n(E) - n sc (E) 



in 



< C ^ T » (5.11) 



holds for any fixed E G [—3,3] with an overwhelming probability. The suprcmium over E is a standard 
argument for extremely small events and we omit the details. This completes the proof of (2.26) after 
possibly increasing L (hence Aq) and decreasing in order to replace the (logN)Tj^ with (logiV)^. 

Now we turn to the proof of (2.25). Let L as before. Fix any 1 < j < N/2 and let E = jj, E' = Xj. 
Setting t N = (logN)T™ = (log N) WL+1 for simplicity, from (5.11) we have 

n sc (E) = n(E') = n sc (E') + 0(t N /N). (5.12) 

Clearly E < 1, and using (5.11) E' < 1 also holds with an overwhelming probability. First, using (2.22) and 

n sc (x) ~ (x + 2) 3/2 , for - 2 < x < 1, (5.13) 

i.e. 

n sc (E)=n sc { l3 ) = ^~{E + 2f' 2 , 

we know that (2.25) holds (with a possibly increased power of logiV in the left hand side) if 

E, E' < -2 + t N N~ 2/3 . (5.14) 

The correct power (\ogN) L can be restored by increasing L (hence ^4o) and decreasing 0, as before. 

Hence, we can assume that one of E and E' is in the interval [—2 + t^N^ 2 ^ 3 , 1]. With (5.13), this 
assumption implies that at least one of n sc (E) and n sc (E') is larger than t^ /N. Inserting this information 
into (5.12), we obtain that both n sc (E) and n sc (E') are positive and 

n sc (E) = n sc (E')[l + 0(t N l/2 % 

in particular, E+2 - E' + 2. Using that n' sc {x) - (x+2) 1 / 2 for -2 < x < 1, we obtain that n' 3C (E) - n' sc {E'), 
and in fact n' sc (E) is comparable with n' sc (E") for any E" between E and E'. Then with Taylor's expansion, 
we have 

\n ac {E') - n sc {E)\ < C\ri ac {E)\\E' - E\. (5.15) 

Since n' sc (E) = p sc (E) ~ y/H and n sc (E) ~ k 3 / 2 , moreover, by _E = 7^ we also have n BC (E) = j'/iV, we 
obtain from (5.12) and (5.15) that 

,£, _ £| < C|7i sc (^)-n sc (g)| < Ctjy < Ct N < Cijy 



n' flC (JS7) " Nn' sc (E) ~ N{n sc {E)) 1 / 3 ~ iV 2 / 3 jV3' 

which proves (2.25), again, after increasing L and decreasing <f> to achieve the claimed (\ogN) L prefactor. 
This concludes the proof of Theorem 2.2. 

□ 
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6 Edge Universality 

In this section, we prove the edge universality, i.e., Theorem 2.4. At the end of Section 6.1 we will give a 
heuristic explanation why matching the second moments is sufficient but we first need some preparation and 
to introduce various notations. We will consider the largest eigenvalue An, but the same argument applies 
to the lowest eigenvalue Ai as well. 
For any E x < E 2 let 

yi(E ll E 2 ):=#{E 1 <X J <E 2 } 

denote the number of eigenvalues in [E\, E 2 \. By Theorem 2.2 (rigidity of eigenvalues), there exist positive 
constants Aq, 4>, C and c > 0, depending only on i?, 5± and Cq such that with setting 



we have 
and 



L:= A loglogN (6.1) 

'{|7V 2/3 (AAr-2) > (logiV) L } <Cexp[-cQogN)+ L ] (6.2) 



(> - 2 -^f. - + ®0) >- <>^> 1 } <- °-> I - ***>*] <«> 

for sufficiently large TV > iVo(i?, 5±,Co). These estimates hold for both the v and w ensembles. Using these 
estimates, we can assume that s in (2.41) satisfies 

- (logiV) L < s < (logiV) L . (6.4) 

With L from (6.1), we set 

E L := 2 + 2(log/V) L iV- 2/3 . (6.5) 

For any E < E L let 

XE ■= 1[E,E L ] 

be the characteristic function of the interval [E,El\. For any r\ > we define 

Q ri ( X ) ■= ( 2 V , 2^ = - 3m —^— ( 6 - 6 ) 

ir(x z + rj z ) ir x — ir\ 

to be an approximate delta function on scale 77. In the following elementary lemma we compare the sharp 
counting function N(E,El) = Tr xe(H) by its approximation smoothed on scale n. 

Lemma 6.1 Suppose that the assumptions of Theorem 2.4 hold and L, <p satisfy (6.2) and (6.3). For any 
e > 0, set £1 := JV~ 2 ' 3 ^ 36 and n := JV~ 2 / 3 ~ 9e . Then there exist constants C 7 c such that for any E satisfying 

\E-2\N 2 ^ 3 <-(logN) L (6.7) 

we have 

F{\Tr XE (H)- Tr X E*0 n (H)\ < C (N~ 2e + K(E - l u E + &)) } > 1 - Ccxp[-c(log7V)* L ] (6.8) 

for sufficiently large N . This estimate holds for both the v and w ensembles. 
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Proof of Lemma 6.1. By (6.5) and (6.7) we have 

ri<£i<E L -E< CN- 2 / 3 {\ogN) L . 
Since xe is the characteristic function of [E, El], for any x € M., we have 



(6.9) 



\xe(x) -Xe*O v (x)\ 



Xe{x) 



E L -x, 



E-a 



^(y)dy 



Let d = d(x) := \x — E\ + r\ and d^ = d^x) := |a; — Ei\ + ?y. Using that J 9 V = 1 and the estimate 



6 v (y)<iy 



l 



"" Ja ST + »7 






a + 77 



a > 0, 



an elementary calculation shows that 

\Xe{x) - Xe * 9r,{%)\ < Ctj 



Xe{x) 



Er-E 



dL{x)d(x) dz,(x) + d(x) 



(6.10) 



for some constant C > 0. It is easy to check that if min{d, d^} < £i, then the right side of (6.10) is bounded 
by a constant and if min{<i, d^} > l\, then it is less than 0(r\jl\) = 0(N~ 6e ). Hence we have 

\Ty X e{H) - Ti-xe * 6 ri {H)\ < C ( Tr f(H) + f X(E, E L ) + J4(E -h,E + h) + N(E L - £ x , oo) ) , (6.11) 



where 



7](E L - E) 

f( X > '■= A I \M \ 1 ( X < E - «l) 



.12) 



di(x)d(a;) 

With the assumption (6.7), N(E,El) and N(£x — ^i,oo) can be bounded by using (6.3) and (6.2). Hence 
it follows from (6.11) that 



\Tt X e(H) - Ttxe * V (H)\ < C (Tr f(H) + N(E - t 1: E + h) + N~ 5s ) 



(6.13) 



holds with a probability larger than 1 — Cexp[— c(log7V)^ L ], for some constants C and c and for sufficiently 
large N, uniformly in E with (6.7). Set 

9{v) 



and notice that 



which implies 



/(*) 



r)(E L -E) d L (x)d(x) 

Recalling from (2.11) and (6.6) that 



y 2 + i{' 



\<C(g*6 tl )(a) H\a\>h, 



K*^-*) < °^£*£zW<c (s *e ex ){E-x). 



\E-x[ 



1 



(6.14) 
(6.15) 

(6.16) 



— Tcdt, (H-E)= — 3m Tr 

N lV ' irN H-E-Ux -k 



3mm(E + i£\), 
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we obtain 



Tr f(H) < CN V (E L - E) [ -J—^J mm (E -y + ih)dy 
<CiV 1 / 3 rKlogA r ) L f ^—^hmm^E-y + ih 

Js.y +«i L 



{l ° sN)CL 'dy, (6.17) 



N£ x 



where, by (2.19), the second inequality holds with a probability larger than 1 — Cexp[— c(log N)^ L ] and we 
also used (6.9). The integral of the second term in the r.h.s is bounded by 



CN^ V (logN) L f 2 \ e2 {l0g ^ dy < N-^ilogNfHS < N~**, (6.18) 

by using the definitions of £i and r\. 

For the first term in the r.h.s of (6.17) we use the elementary estimate 



3mm sc (E -y + it x ) < Cyjli + \\E-y\-2\. 

The integral in the region 

A:={\\E-y\-2\>h} 

can be bounded by 
j 3, m ,AE-y + il ,) < c I \\*-V\-f\ v < c f W+JE-W , c{ l| £ -2|^ 

Ja v 2 + i\ Ja v 2 +(l h » 2 +<! v^r h 

On the complementary region we have 

/ -Aj2^m sc (E-y + ih)dy<CVh [ -^-,dy < C£~ 1/2 . 
Combining these estimates and using (6.7) together with the definitions of £i and r\ we get 
CN^riOog N) L f -^-^3mm sc {E -y + Ui)dy < N~ 2 ^ 

Jr y + "1 

and therefore, together with (6.18), we have Tr f(H) < 2N~ 2e . Considering (6.13), we have thus proved 
Lemma 6.1. rj 

Let q : R — > M. + be a smooth cutoff function such that 

q{x) = 1 if \x\ < 1/9, q(x) = if |x| > 2/9, 

and we assume that q(x) is decreasing for x > 0. 
Corollary 6.2 Suppose the assumptions of Lemma 6.1 hold and E satisfies 

\E-2\N 2/Z < (logN) L . (6.19) 
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Let i := \l x N 2e = ±iV~ 2 / 3 ~ £ . Then the inequality 

TrxE+i * 6 n (H) - N~ £ < N(E, oo) < Tr X E-i * 9 V (H) + N~ e (6.20) 

holds with a probability bigger than 1 — Ccxp[— c(log N)^ L ]. Furthermore, we have 

Eq(Tr XE ~i*0 ri (H)) <V(N{E,oo) =0) <E q(Tr XE+e * 8 n (H)) + Ccxp[- c(\ogN)* L ] (6.21) 

for sufficiently large N independent of E as long as (6.19) holds. Notice that the directions in the inequalities 
(6.20) and (6.21) are opposite since q is decreasing for positive arguments. 

Proof. For any E satisfying (6.19) we have E L - E > £ thus \E-2- £\N 2 / Z < §(logiV) L (see (6.7)), 
therefore (6.8) holds for E replaced with j/S [E — £, E] as well. We thus obtain 



f E 

T±Xb(H)<V 1 I dyTr Xy {H) 

JE-l 

<T X f dylixy *0 r ,(H)+Ct- 1 f dy[N- 2£ + X(y-£ 1 ,y + £ 1 )] 
Je-i Je-i 



-2e . „*1, 



< Tt Xe ^ * 9 n (H) + CN~ 2e + CjJi{E -2£,E + £) 

with a probability larger than 1 - Cexp[-c(logiV)* L ]. From (2.26), (6.19), h/£ = 2N~ 2e and £ < iV~ 2 / 3 , 
we can bound 



o f E+£ 1 

—N(E — 2£ E + £) < N l ~ 2e I " ff\A* j. at-' ; n™- \r\ L i <r _ at-' 
£ Je 



Q sc {x)dx + N- 2£ (\ogN) L i < -N- 



<E-1l 2 



with a very high probability, where we estimated the explicit integral using that the integration domain is 
in a C7V~ 2 / 3 (logiV) L -vicinity of the edge at 2. We have thus proved 

7f(E,E L ) = Tr XE {H) < Tr X £-£ * 9 V {H) + N~ e . 

By (6.2), we can replace ^(E, El) by N(E, oo) with a change of probability of at most Cexp[— c(log N)^ L ]. 
This proves the upper bound of (6.20) and the lower bound can be proved similarly. 

On the event that (6.20) holds, the condition N(E, oo) = implies that TrxE+e * V (H) < 1/9. Thus we 
have 

F(N(E,oo) = 0) <¥{TY XE+e *9 v (H) < 1/9) + Ccxp[-c(logiV)^ L ]. (6.22) 

Together with the Markov inequality, this proves the upper bound in (6.21). For the lower bound, we use 

Eq(TrxE-t*8 v (H)) <F(TrxE-t*6 v {H) < 2/9) <P(N(S,oo) < 2/9 + N~ s ) =P(X(£,oo) =0), 

where we used the upper bound from (6.20) and that 3\f is an integer. This completes the proof of the 
Corollary. rj 
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6.1 Green Function Comparison Theorem 

Recalling that V {H) = -JmG^r]), Corollary 6.2 bounds the probability of N(E, oo) = in terms of the 
expectations of two functionals of Green functions. In this subsection, we show that the difference between 
the expectations of these functionals w.r.t. two probability distributions v and w is negligible assuming their 
second moments match. The precise statement is the following Green function comparison theorem on the 
edges. All statements are formulated for the upper spectral edge 2, but with the same proof they hold for 
the lower spectral edge —2 as well. 

Theorem 6.3 (Green function comparison theorem on the edge) Suppose that the assumptions of 
Theorem 2.4, including (2.40), hold. Let F : R — > R be a function whose derivatives satisfy 



|F( q )(x)|(|.t| + 1)- Gi <Ci, q = 1, 2, 3, 4 



3.23) 



with some constant C\ > 0. Then there exists e$ > depending only on C\ such that for any e < Sq and for 
any real numbers E, E\ and Ei satisfying 

\E - 2| < N- 2/3+e , \E X - 2| < N- 2/3+e , \E 2 - 2| < N~ 2/3+e , 

and setting r\ = N~ 2 ' 3 ~ e , we have 



E V F (NnJm m{z))- E W F (NrfZm m(z)) 



<CN~ 1/6+Ce , z = E + in, 



and 



E 2 \ I pE 2 > 

E V F ( N I dy 3m m(y + irj) ) -E W F [N &y 3mm(y + irj) 



< CN' l ^ +Ce 



for some constant C and large enough N depending only on C\, $, S± and C'o (in (2 A)). 



(6.24) 



(6.25) 



Theorem 6.3 holds in a much greater generality. We state the following extension which can be used to 
prove (2.42), the generalization of Theorem 2.4. The class of functions F in the following theorem can be 
enlarged to allow some polynomially increasing functions similar to (6.23). But for the application to prove 
(2.42), the following form is sufficient. The proof of Theorem 6.4 is similar to that of Theorem 6.3 and will 
be omitted. 

Theorem 6.4 Suppose that the assumptions of Theorem 2.4, including (2.40), hold. Fix any k £ N+ and 
let F : R — > R be a bounded smooth function with bounded derivatives. Then for any sufficiently small e 
there exists a 8 > such that for any sequence of real numbers E^ < . . . < E\ < Eq with \Ej — 2| < N~ 2 ' 3+e , 
j = 0, 1, . . . , k, we have 



(V - E W V (n j dy3mm(y + irj), ...,N f dy3 

\ J Ei J E k 



mm(y + irj) 



< N- 



3.26) 



Assuming that Theorem 6.3 holds, we now prove Theorem 2.4. 
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Proof of Theorem 2.4- As we discussed in (6.2) and (6.3), we can assume that (6.4) holds for the parameter 
s. We define E := 2 + sN- 2 / 3 that satisfies (6.19). We define E L as in (6.5) with the L such that (6.2) and 
(6.3) hold. For simplicity, we set £ = <\>L and note that £ > 2 for sufficiently large N. With the left side of 
(6.21), for any sufficiently small e > 0, we have 

E w q(Tr XE -t * n {H)) < P w (N(£,oc) = 0) (6.27) 

with the choice 

2 ' ' 

The bound (6.25) applying to the case Ex = E — t and E 2 = El shows that there exist 6 > 0, for sufficiently 
small e > 0, such that 

E v q (TrxE-e * 6 n (H)) < E w q (Tr X E-t * V (H)) + N~ s (6.28) 

(note that 9e plays the role of the e in the Green function comparison theorem) . Then applying the right 
side of (6.21) in Lemma 6.2, with £ = <pL > 2, to the l.h.s of (6.28), we have 

F V (N(E - 2£,oo) = 0) < E v q(TrxE-t * 71 {H)) + C*cxp [ - c(logiV) 2 ]. (6.29) 

Combining these inequalities, we have 

P V (N(E - 21, oo) = 0) < F™(N(E, oo) = 0) + 2N~ S (6.30) 

for sufficiently small e > and sufficiently large N. Recalling that E = 2 + s7V~ 2 / 3 , this proves the first 
inequality of (2.41) and, by switching the role of v, w, the second inequality of (2.41) as well. This completes 
the proof of Theorem 2.4. rj 

Proof of Theorem 6.3. Notice that 

pE'2 pE2 

N dy3mm(y + ir]) = ?; / dyTvG{z)G(z), z = y + irf. (6.31) 

<J Ei J E\ 

We now set up notations to replace the matrix elements one by one. This step is identical for the proof of 
both (6.24) and (6.25), and we will use the notations of the case (6.24) which are less involved. 
Fix a bijective ordering map on the index set of the independent matrix elements, 

cf>:{(i,j):l<i<j<N}-+{l,..., 7 (N)}, 7 (JV) := jg±l) ; (6 . 32 ) 

and denote by H~ l the generalized Wigner matrix whose matrix elements /ly follow the w-distribution if 
0(i,j) — 7 an( i they follow the w-distribution otherwise; in particular Hq = H^ and H^t N \ = H^ w \ The 
specific choice of the ordering map (6.32) is irrelevant; in the following argument, <j> could be any bijective 
ordering map. With 77 = iV~ 2 / 3 ~ e , it was proved in (2.21) that for any constant £, > 0, 

P| max max max ( ) - 6 k im sc (E + in) < N~ 1/3+2e ) > 1 - Ccxp[-c(log N)^} 

\o<-y<~f(N)i<k,i<N e \H^ - E - in J kl J 

(6.33) 
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with some constants G, c and large enough N > Nq (may depend on £). The last maximum in the formula 

(6.33) runs over all E satisfying \E-2\ < N- 2 / 3+e . When applying (2.21), we have used (log7V) 4L (7Vr/)" 1 < 
iv -i/3+2 £ and that 

3mm sc (E + irj) < y/\E~2\+^ < CN~ 1/3+e/2 (6.34) 

for \E-2\< CN- 2 / 3+s . 



We set z = E + if] where \E - 2| < CN- 2 / 3+e and 77 = N- 2 / 3 - e . From (6.33), (6.34) and the identity 

3mm{z) = -3mTrG = ^ E G y G~, 



we have that 



and 



^GijGij = \Nr)3mm(z)\ < CN 2e 



E G « G « < Nv 2 (\m 8C \ + CN- l l* +2e ) < CN- 1 ' 3 



-l/3-2e 



3.35) 



(6.36) 



hold with a probability larger than 1 — Cexp[— c(log AT)^]. Since the derivative of F is bounded as in (6.23), 
there exists G depending on F, ■&, 5± and Cq such that 



EF ( v 2 E G * G v 1 - EF ( ? ? 2 E G ^ G *. 



< CiV -1 / 3 " 1 " 06 . 



3.37) 



This holds for both the v and the w ensembles. 

To show (6.24), we only need to prove that for small enough e, there exists G depending on F, •&, 6± and 
Cq such that 






F (G {v) -»■ G {w) 



< CN~ 1/6+Ce 



(6.38) 



where G^ and G^-* denote the Green functions of the H^ and H^ w \ respectively. Here the shorthand 
notation F (G^ — > G'™ 1 ') means that we consider the same argument of F as in the first term in (6.38), 
but all G^ terms are replaced with G^ w \ In fact, the upper index notation is slightly superfluous since the 
Green function is the same, only the underlying ensemble measure changes, but we wish to emphasize the 
difference between the two ensembles in this way as well. 

Similarly, for (6.25), we only need to prove that for small enough e, there exists G depending on F, ■&, 
S± and Gq such that 



E V F\N dyln^GlfG^iy + in)] - E"Wg<»> -► G< 



Ei 



i¥=j 



< CN~ l ' 6+Ce . 



(6.39) 
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Consider the telescopic sum of differences of expectations 

-EF(H {v) ^H {W A (6.40) 

7 (JV) 

= ^2 [ EF ( H(V) -> H ~f) - EF ( ff(l,) ">■ H 7-l 
7=1 

Let £™ denote the matrix whose matrix elements are zero everywhere except at the (i,j) position, where it 

is 1, i.e., £j.£ = SikSjf.. Fix a 7 > 1 and let (a, b) be determined by 4>{a, b) = 7. For simplicity to introduce 

the notation, we assume that a 7^ b. The a = b case can be treated similarly. We note the total number of 

the diagonal terms is N and the one of the off-diagonal terms is 0(N 2 ). We will compare -ff 7 -i with H 1 for 

each 7 and then sum up the differences according to (6.40). 

Note that these two matrices differ only in the (a, b) and (6, a) matrix elements and they can be written 

as 

# 7 _i=Q+-^=V, V := v ab EW + v ba E^ (6.41) 

V-/V 



H 1= Q + -^=W, W := w ab E^ + w ba E^ ba \ 

v TV 



with a matrix Q that has zero matrix element at the (a, b) and (b, a) positions and where we set Vji := Vij 
for i < j and similarly for w. Define the Green functions 

JZ:=— !— , S:=— ^ , T:=— — . (6.42) 

Q — z tl~f-i ~ z H 1 — z 

We first claim that the estimate (6.33) holds for the Green function R as well. More precisely, the 
probability of the event 

n R := max max \R k i{E + in) - S kl m sc (E + in)\ > N' 1/3+2e (6.43) 

l<k,l<N E ' ' 

(where max^ is the maximum over all E with \E — 2| < A r_2 / 3+£ ) satisfies 

P(Oh) < C cxp [ - c(log iV)«] (6.44) 

for any fixed £ > 0. To sec this, we use the resolvent expansion 

i? = S + N-^SVS + N-\SV) 2 S +... + N- 9 ^ J {SV) 9 S + N- 5 {SV) W R. (6.45) 

Since V has only at most two nonzero elements, when computing the (fc, €) matrix element of this matrix 
identity, each term is a sum of finitely many terms (i.e. the number of summands is A^-independent) that 
involve matrix elements of S or R and v^, e.g. (SVS)kg = SkiVijSji + SkjVjiSu. Using the bound (6.33) for 
the S matrix elements, the subexponcntial decay for v^ and the trivial bound \Rij\ < ?/ _1 < N, we obtain 
that the estimate (6.33) holds for R as well. 

After having introduced these notations, we are in a position to give a heuristic power counting argument 
that is the core of the proof. In particular, we can explain the origin of the second moment matching 
condition. Take F{x) = x for simplicity. A resolvent expansion analogous to (6.45) gives 

E V J2^S U = 7 ? E3m Y, [ R n - N-^ 2 (RVR) U + N-\(RV) 2 R) U + . . .] (6.46) 

i i 
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which is an expansion in the order of TV -1 / 2 since the matrix V contains only a few nonzero elements of size 
TV -1 / 2 . Notice that rj J ^ li 3m Su estimates the number of eigenvalues near E in a window of size r\. For the 
two ensembles to have the same local eigenvalue distribution on scale rj, we need the error term to be less 
than order one even after performing the telescopic sum. In the bulk, r\ has to be chosen as r\ ~ N~ x and we 
can view r\ J^ as order one in the power counting. Since in the telescopic expansion we will have TV 2 terms 
to sum up, we need that the error term of the expansion is o(N~ 2 ) for each replacement step, i.e., for each 
fixed label (a, b). This explains the usual condition of four moments to be identical for the Green function 
comparison theorem in the bulk [22] since the first four terms in (6.46) has to be equal. Near the edges, i.e., 
at energies E with \E — 2| < TV -2 / 3 , the correct local scale is 77 ~ JV~ 2 / 3 and the strong local semicircle law 
(2.21) implies that the off-diagonal Green functions are of order N^ 1 ^ 3 and the diagonal Green functions are 
bounded. Hence the size of the third order term rj'K'^2 i N~ 3 / 2 ((RV) 3 R)u is of order 

r]NN~ 3 / 2 N~ 2 / 3 = JV~ 2+1/6 

where we used that, for a generic label (a, b), there are at least two off-diagonal resolvent terms in ((RV) 3 R)u. 
Notice that the error term is still larger than 7V~ 2 , required for summing over a, b (this argument would 
be sufficient if we had a matching of three moments and only the fourth order term in (6.46) needed to 
be estimated). The key observation is that the leading term, which gives this order JV~ 2+1 / 6 , has actually 
almost zero expectation which improves the error to be less than o{N~ 2 ). This is due to the fact that with 
the help of (6.33) we are able to follow the main term in the diagonal elements of the Green functions and 
thus compute the expectation fairly precisely Notice that similar reasons apply to the proof of Lemma 4.1 
in Section 7. 

6.2 Main Lemma 

The key step to the proof of Theorem 6.3 is the following lemma: 

Lemma 6.5 Fix an index 7, recall the definitions of Q, R and S from (6.42) and suppose first that 7 = 
4>(a,b) with a ^= b. For any small e > and under the assumptions in Theorem 6.3 on F, E, E\ and E 2 , 
there exists C depending on F , -d, S± and Cq (but independent of"f) and there exist constants An and Bn, 
depending on the distribution of the Green function Q, denoted by dist{Q), and on the second moments of 
v a b, denoted by m2(v a b), such that 



EF W J2 S v S Ji( z ) I - EF W X! %%(*) - A N (m 2 (v ab ), dist(Q)) 
with z = E + irj. rj = N~ 2 ' 3 ~ e , and 

EfIt, [ 'dyY^ SijSji (y + irj) I - EF ( n f ' Ay ]T R^ (y + irj) ) 



< CN- 13 / 6+Ce , (6.47) 



<i=j 



(6.48) 



i^j 



- B N (rn 2 (v ab ), dist{Q)) 



< CN- 13/6+Ce 



for large enough N (independent of y). The constants A^ and Bn may also depend on F and on the 
parameters $, 5± and Cq, but they depend on the centered random variable v a b only through its second 
moments. 
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Finally, if a = b, i.e. 7 = <fi(a,a), then the bounds (6.47) and (6.48) hold with CN 11 / 6 + Ce standing on 
their right hand side. 

The same estimates hold if S is replaced by T everywhere and note that Q is independent of v a b and w a b- 
Since m2(v a b) — fnziwab), we obviously have that A^{rn2(v a b),dist(Q)) = A^r (777,2 (w a b), dist(Q)). Thus we 
get from Lemma 6.5 that in case of a 7^ b 






< CN- 13 / 6+Ce (6.49) 



and a similar bound for the quantity (6.48). In case of a = b, the estimate is only CN~ n ^ e+Ce . Recalling the 
definitions of S and T from (6.42), the bound (6.49) compares the expectation of a function of the resolvent 
of H 1 and that of H^_\. The telescopic summation then implies (6.38) and (6.39) since the number of 
summands with a 7^ b is of order TV 2 but the number of summands with a = b is only N . This completes 
the proof of Theorem 6.3. rj 

Proof of Lemma 6.5. We will only prove the more complicated case (6.48); the proof can be adapted 
easily for (6.47) which will be omitted. Similarly to ft_R from (6.43), define 

ft s := max max \S k i(E + in) - 5 kl m sc (E + in)\ > N~ 1/3+2e , 

l<k,l<N E ' ' 

where max^ is the maximum over all E with \E — 2| < N~ 2 < 3+e . Since S is the Green function of -ff 7 -i, we 
obtain from (6.33) directly that 

P(^s) <Ccxp[-c(log7V)«] (6.50) 

for any fixed £ > 0. Finally, set 

^v := {\v a b\ > N s a ab }, and ft := ft fi U ft s U ft v . (6.51) 

Using (6.44), (6.50) and the subexponential decay of v a b, we obtain 

P (ft) < C cxp [ - c(log Nf] . (6.52) 

for any fixed £ > and large enough N. Since the arguments of F in (6.48) are bounded by CN 2+2e and 
F(x) increases at most polynomially, it is easy to see that the contribution of the set ft to the expectations 
in (6.48) is negligible. We can thus concentrate on the set ft c . 
Define x s and x R by 



(6.53) 



x S ■= V / dy^SijSjiiy + iff), x R := n / dy^R^R^y + in), 
jEl ift jEl i& 

and decompose x into three parts 

/•E2 
x s =x$ + xf + x$, x s k ■- n / dy V SijSjiiy + irj), (6.54) 



«#J, \{i,j}n{a,b}\=k 
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and xf? are defined similarly. Here k = \{i,j} fl {a, 6}| is the number of times a and b appears among the 
summation indices i,j (if a = b then we count it only once); clearly k = 0, 1 or 2. The number of the terms 
in the summation of xf, is 0(N 2 ~ k ) since a and & are fixed. From the resolvent expansion, we have 

S = R- N- 1/2 RVR + N-\RV) 2 R - N~ 3/2 (RV) 3 R + N- 2 {RV) A S. (6.55) 

In the following formulas we will omit the spectral parameter from the notation of the resolvents. The 
spectral parameter is always y + ir\ with y € [E\, E2], in particular \y — 2| < N~ 2 / 3+E . 
If \{i,j} n {a, b}\ = k, using (6.55) and (6.33), we have in O c 

\N~ m ' 2 [{RV) m R} v \ < c m i V -W2+3ms iV -(2-fc)/3 5 m e N+ ; fc ^ 0,1,2 (6.56) 

for some constants C m . Furthermore, we can replace the last R by S, i.e., we also have 

Ar 2 [(itt0 4 S]^ < C7V- 2 -( 2 - fc >/ 3+Ce . (6.57) 

Therefore, in f2 c we have, 

|3fc-af| < CTV- 5/6 - 2fc/3+c,e , fc = 0,1,2. (6.58) 

Inserting these bounds into the Taylor expansion of i* 1 and keeping only the terms larger than o(-/V~ 2 ), we 
obtain 



E[F(^) - F(x R )] - E (V(^)(^ - s«) + iF"(x«)(4 - 4) 2 + F'(z«)(x? - zf )) 



< cW" 13/6+c,£ 



(6.59) 

where we used the remark after (6.52) to treat the contribution on the event f2. Since there is no X2 appearing 
in (6.59), we can focus on the case k = or 1. 

For k — or 1, we define Q e ' ior £ = 1, 2 or 3, as the sum of the terms in xf, — x R in which the total 
number of v a b or Vb a is (■, i-e., 



Q[ k) := -N- 1 ' 2 ^ / dy J2 [R lJ (RVR) ]l + {RVR) ZJ R ji j (6.60) 

"^ |{ij}n{a,fc}| = fe 

Q { 2 k) := N- 1 ^ f ' dy ^ ( i?„((i?V) 2 .R) J4 + ((i?l/) 2 i?) y i?~+ (RVR) tJ (RVR) Jt ) (6.61) 

Qf ] := -N-^ 2 V J ' dy J2 ( Rij((RV) 3 R)ji + R'HRVfR)^ + {{RV) 2 R) lJ '(RVR)~ (6.62) 



\{i,]}n{a.b}\=k 



(RVRM(RV) 2 R) 3t 



By these definitions and (6.56), we have 

Q (k) < N -e/2-l/3-2k/ 3+ Ce in Q c ( g g 3) 

Furthermore, with (6.56) and (6.57), we decompose xf. — x^ as 

4 - a£ = Q[ k) + Q 2 k) + Qi k) + 0(iV- 7/3+Ce ). (6.64) 
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The last two terms in (6.62) can also be bounded by using (6.56), i.e., 
Qf = 0(N- 13 / 6+c ") - N- 3 / 2 V f ' dy Y, (R lJ {{RVfR) Jl +R~({RVfR) %] ) in il c . (6.65) 

Inserting (6.63) and (6.64) into the second term of the l.h.s of (6.59), with the bounds on the derivatives of 
F, we have 

E ( F'(x R )(x$ - x R ) + F'(x R )(xf - x?) + ^F"(x R )(x$ - x R )A (6.66) 

= B + EF'{x R )Q 3 0) + O (V 13 / 6+Ci 
where 

B:=eIj2 F'(x r )[Q^ + Q { 2 k) ] + ^"(x^iQ^A (6.67) 

\fc=o,i J 

J2 F>(x R )E Vab [Q^ + Q^] + l -F"{x R )-E Vah [Q^f 

depends on v a b only through its expectation (which is zero) and on its second moments. 

First we give a trivial estimate on Qg . In case i, j are distinct from a and b, it is easy to see by writing 
out terms in (6.65) that they contain at least three offdiagonal elements of resolvent; for example in the 
term RijRj a v a bRbaVabRbaVabRbi, appearing in Rij((RV) 3 R)ji, the resolvent matrix elements RijRj a Rbi are 
off-diagonal. Each off-diagonal matrix element of R is bounded by A^ _1 / 3+2e in £l c Rl while the diagonal terms 
can be estimated by |m sc |, hence by a constant, at a negligible error in the set il c C £7^,. This shows that 
each term in the integrand in (6.65) is bounded by C[-/V -1 ' 3+2e ] . Note that every estimate is uniform in y, 
the real part of the spectral parameter, as long as \y — 2| < JV~ 2 / 3+e . Estimating F' trivially, we thus obtain 

\E[F(x s ) - F(x R )} -B\< CW- 11/6+Ce . 

This bound proves Lemma 6.5 for the case a = b. 

For a ^ b this estimate would not be sufficient since the number of pairs a ^ b to sum up in the telescopic 
summation is of order iV 2 . However, we will show that in this case the expectation of the Q 3 term is of 
smaller order than the trivial estimate gives. 
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From now on we assume that a ^ b. By (6.65) we have, in fl c that 

s 2 



qF = 



0{N -13/ 6 +Ce ) _ ^-3/2^ f 2 dy J2 Y, 

1 i^Ln h i^Li n 



j^a.b vfcj,a,b 



Rij RjaVabRbbVbaRaaVabRbi + RiaV a bRbbVbaRaaV a bRbjRji ) + (o ++ b) 



0{N- 



-13/6+Ce 



)-N-^ V j ~dy£ £ 

i j^a,b v£j,a,b 

\m 2 sc RijR ja Ru + m 2 sc Ri a RbjRji) \v a b\ 2 v a b + {a ^ b) 



(6.68) 



Note that we explicitly collected those terms that contain the most diagonal elements of R; these are the 
main terms of Q 3 . There are several other terms, for example RijRjaVabRbaVabRbaVabRbi, that appear in 
the expansion of Rij[(RV) 3 R]ji, but these are lower order terms and can be directly included in the error 
term. In the second step in (6.68) we estimated the diagonal terms by m sc at a negligible error in the set 

n c c n c R . 

We note that v a b is independent of R and K Vab \v a b\ 2 v a b = 0(1). Combining (6.68) with (6.66) and (6.59), 
we obtain 



\E[F(x s ) - F{x R )} -B\ 
<CN~ 13 / 6+Cs + \EF'(x R )Q^\ 
<CN~ 13/6+Ce + CN- 5/6+Ce max max 

v i^T-{i,J}n{a,b}=<L 



(6.69) 



U„R\ 



¥,F'{x H )R i3 R ja Ru\ + \EF J (x H )R ia R bj Rji\ + {a o 6) 



where we used the trivial bounds on F' and m sc and we agsin used that every estimate is uniform in y, the 
real part of the spectral parameter, as long as \y — 2| < N~ 2 / 3+e . As before, max B in the last line of (6.69) 
indicates maximum over all y with \y — 2| < N~ 2 ' 3+£ and the spectral parameter of all resolvents is y + ir\. 
The following lemma shows that the expectation of the product of the off-diagonal terms in (6.69) is of 
smaller order than the trivial estimate gives. 

Lemma 6.6 Under the assumption of Lemma 6.5 and assuming that a,b,i,j are all different, we have 

\EF'{x R )R lJ R~R b ~{y + ^V)\< iV- 4/3+Ce (6.70) 

for any y with |y — 2| < N~ 2 > 3+e , and the same estimate holds for the other three terms in the r.h.s of 
(6.69). 



If this lemma holds, then we have thus proved in the case a ^ b that 

\E[F(x S ) - F(x R )} -B\< N~ 13 ^ +Ce 



(6.71) 



where B is defined in (6.67). With the definitions of x's in (6.53), this completes the proof of Lemma 6.5 for 
the remaining a ^ b case. rj 
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Proof of Lemma 6.6. With the relation between R and S in (6.45) and (6.56), one can see that (6.70) 
is implied by 

\EF / (x s )S lJ 'S~S b ~\ < 7V-4/3+Ce ; (6 72) 

under the assumption that a, b, i,j arc all different. This replacement is only a technical convenience when we 
apply the large deviation estimate (Lemma 3.3) below. Lemma 3.3 was formulated with random variables of 
equal variance, while the matrix elements of Q cannot all be normalized to have the same variance since two 
matrix elements are zero. The contribution of these two elements is negligible anyway, but the presentation 
of the argument is simpler if we do not have to carry them separately in the notation. Since S is the Green 
function of a usual generalized Wigner matrix with all variances being positive, it is easier to deal with (6.72) 
instead of (6.70). 

From the identity (3.7) applied to the Green function S, we have for any different i, j and a 



\S tj - S&\ = S^Sa^Saa)- 1 < C(N V )' 2 < CN^ 3 + C ' 



infi c 



From (6.33) we have 



\S tj \ < N~V 3+Ce , i?j, infi c . 



Combining (6.73) and (6.74), we have 



\x s -x s \<N- l / 3+Cs , 



(6.73) 

(6.74) 

(6.75) 



where x s is defined using the resolvent of the matrix H^ 1 exactly as x s was defined using the resolvent S 

of matrix -ff 7 _i. As usual, Hzz 1 denotes the matrix H^—i with a-th row and column removed. Similarly, 

we have 

(6.76) 



-,(a) , 



(«) 



SijSjaSbi — Sy Sj a S bi 



-4/3+Ce 



< 7V-^ +oe , in n c 
Hence by these inequalities and the bounds on the derivatives of F, we have 



|EFV)S 4i S io S w | < \E[F / (x s )]S { l fs 3a S { b f\+0( y N- i / 3+c ^ . 
Applying the identity (3.5) to Sj a , we have 

S ]a = S n S^Z%\ with Zjf>:= Y, h 3S S^ a) h ta -h ]a , 

st<£{a,j} 



(6.77) 



(6.78) 



where h a p = (J? 7 _i) „. With the bound on the matrix elements of S in (6.33) and the identity (3.7), in the 
set il c we have 



S]3 

Setting 



0(N 



-l/3+Ce 



), S<£ = m sc + 0(N-^+ c % S(j«) = m sc + 0(N~ 



l/3+Ce 



). (6.79) 



n z :={\Z^\>N-iW} 
with a sufficiently large constant C, Lemma 3.3 implies that 

F(nun z ) <c cx P [ - c(io g N)£] , 
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for any fixed £ > since on the set H, c we have 



s,t^{a,j} s^a,j 

using the last formula in (6.79). Therefore, with (6.78), in il c n £l c z we have 



Combining (6.80) with (6.77), we see that 



EF'i^SijSjaSinl <\mi 



:S o( a ) o( a ) 



E[F'(x s )]S^S^ [ J] ** S - * 



fa "ja 



:(i a ) 



(6.80) 



+ (V-4/3+cV^ (681) 



Since x , SL S^ , hj S and S^ are all independent of the a— th row and column of i? 7 -i, and the expec- 
tations of /it a and /ij a arc zero, the first term in r.h.s. of (6.81) equals to zero. This implies (6.72) and 
completes the proof of (6.70). The other terms in (6.69) can be bounded similarly. This completes the proof 



of Lemma 6.6. 



□ 



7 Proof of Lemma 4.1 



7.1 Setup and notations 



N 



The p-th moment of Yli=i %i 1S given by 



NP 



Ei(r c ) 



iV 



E^ 

9=1 



N N N 

^EE---E---EKn^-..^, 



# 91 = 1 



q P = l 



(7.1) 



where the various #'s can be either or the complex conjugate. The precise choice of # will be irrelevant 
for our argument and the summation over them yields an irrelevant overall factor 2 P . 
We write up the definition of Z qa from (3.16) as follows: 



N 



Z ^= E G tld h 



v Q.ot i^a Q%i<la 



S Q%,Qi a gl,q a ]' 



(7.2) 



where the summation is over all q\ ^ q a and q^ ^ q a . To bookkeep the indices in a uniform way, we denote 
q a by q^ and we organize the three indices (q^, q\, q^) into a vector q Q for each a = 1,2, ... ,p. 

Furthermore, we organize these p vectors into a 3xp matrix q = (q J a ), for a = 1, . . . ,p and j — 1, 2, 3, with 
entries taking values in Nat := {1,2,..., N}. The slots of the matrix q, parametrized by (J, a), a = 1,2, ... ,p, 
j = 1,2,3, are called vertices, since we will build a graph upon them. The element q° a will be called the 
index assigned to the vertex (j,a). The first entry q\ in q Q will play a special role, it will be called location 
index, the other two indices, q 2 a , q\ will be called nonlocation indices. Similarly, (l,a) will be called location 
vertex and (2, a), (3, a) will be called nonlocation vertices. A pair of indices is called label. We also define 
the set of labels in q a that contain q\: 



{(qLiDAqLiDAil^DAil^i)}, 



1,2. ..p, 
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and sometimes we will use a single letter v or fi for labels, i.e. for elements of Ua=i Qa- Note that Q a 
contains any label v together with its transpose v , where v l := (p, q) if v = (q,p). Carrying v together with 
its transpose is necessary since h v = h u t, i.e. matrix elements with labels v and v l are not independent. 
With these notations, we have 

p 
-^El(r c ) 



j> 



hT,^ ( 7 - 3 ) 



Np 
q 



where we defined 

P M. 

* q := El(r c ) (J [^l°,U(qa)] , ^cc)~h qUl h^ ql -5 ql , ql cj 2 ql ^ a . (7.4) 

The summation in (7.3) runs over all 3 x p matrices q with elements from Nat and with the restriction that 

ll + vl, and ql^qi- (7.5) 

Let 

p 

Oq = Q ■= (J ^a (7.6) 

a=l 

denote the set of all possible labels of /i-variables appearing in the £(q a ) factors and notice that its cardinality 
is bounded by \Q\ < Ap. 

We would like to compute the expectation in (7.4) by first taking the expectation with respect to the 
/i^-variables explicitly appearing in the £'s. Recall G^ q ' = (H^ — z)~ x is the Green function of H^ which is 
an (N — 1) x (N — 1) matrix after removing the q-th row and column from H. Thus G^ q ^ is independent of 
the random variables h v , v £ Q a , i.e. those /i-variables that explicitly appear in £(q Q ). There are, however, 
three complications. First, while each Green function G^ Qa ' , a = 1, 2, . . . ,p, is independent of h v , v £ Q a , 
by definition, it still depends on the other /i M -variables, /x £ Qp, ft ^ a. Second, we have to deal with 
coincidences; the same /i-variable may appear in £(q Q ) and £,{c\p) with a ^ /?; in fact these terms give the 
non-zero contributions. We will develop a graphical scheme to bookkeep the structure of coincidences and 
estimate the number of off-diagonal resolvent elements. Finally, there is a small technical problem related 
to the factor l(r c ) that depends on all h- variables, but this factor equals one with a very high probability 
so a fairly easy argument can remove it. 

To resolve the first problem, we use the resolvent expansion to express explicitly the dependence of G^ a ' 
on the random variables h v with label v £ Qp, (3 ^ a. For q fixed, let E/( a ) — L/ q be the matrix 

(U^) i}k := (ff^kfc, for (i,k) £ QW := Q (a > = (J Qp, (7.7) 

pe{i,..., P },fs^a 

and (U^)i t k '■= otherwise. Note that the number of nonzero matrix elements of U^ is bounded by 
\Q\ < 4p. Define 

H [a] = H [a] ._ H (ql) _ u { a )^ G H = G [a] , = ( H ( q l) _ V (a) _ z yl_ 

Notice that Gq is independent of all the /i-factors that explicitly appear in Ha^la)- From the resolvent 
expansion, we have 



G {q1 ^ = V (-G [a] U {a) ) na G [a] . (7. 



n Q =0 
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To estimate the size of these Green functions, we first note that there is a positive universal constant c 
such that on the set T c we have 



-W 1 -* 31 "° "" (bg7V)2 ' C " lGul ~ X + (\ogNf 
This follows from the fact that d < |m sc (z)| < 1 with some positive universal c' > and for any z € Se, see 



= A ^7i^7^ ^^ f 7wW 

sith 
(3.13). By the perturbation formulas (3.6) and (3.7) we have G\-' = Gij — GikGkj/Gkk for i,j ^ k, thus we 



also have 



^l^ 2A ^GcJvF ^^* 1 + WW> (7 - 10) 

where i, j ^ k. In the good set r c , the matrix elements of U^ satisfy 

\U^\< {l ° gN X /W <N-^ (7.H) 



'N 
(here we used that L < log N/ log log N ) , and G' a ' is bounded as 

^' G &^ 2A ^aoW' |g ^ 1+ g^vf inrc - (7 - 12) 

To see (7.12), we expand 

oo 



m=0 



JV- 



and use (7.11) and the bounds (7.10) on the matrix elements of G^ q \ ijeN 

Using (7.11) and (7.12) and recalling that only finitely many matrix elements of U are non-zero, we easily 
see that the expansion (7.8) is convergent and it can be truncated at finite n a so that the error term can be 
estimated. Thus there will be no convergence problem and we will focus on getting estimates. 

We set 



a=l 



n:=(ni,n 2 ,...,n p ), |n| = ^ 
With this expansion, we can write (7.4) as 



=0|n|=ra 

I' 



^:=El(r c )H [M< n «>£(qa)] , (7.13) 

M (n a ) =M (n a ) . = [(-GH [ /<«))«o G H] 2 3 ( 714 ) 

E W,£ a ,n a ) (7.15) 



vr- .i>o t--,v!z 



6QW 



with v a := (yf ,v%, . . . ,v™ ) and we have expanded t/' Q ' appearing in [(— G^ a 'U^ a ') na G'- a ''\ 2 3 and used 
the notation 

V^,if,n a ) := {-\r-G^lh vf G^lh vS ...h^G™ (7.16) 
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The summation in (7.15) is over all possible ^-labels of the h factors in (7.16). The appearance of the 
//"-labels in (7.16) is just notational simplification, they are explicit functions of z/* and q a as follows: 

tf = (ql,K]i), ^ = Ma, h a ]i), ^ = (h a ] 2 , KW, -vG a+ i = (K,h,<&), (7.17) 

where [i>?]i and [^?]2 denotes the first and second element of the label v" . Notice that G^ is independent 
of all matrix elements hjk explicitly appearing in the £- factors in (7.13). 
Hence 

$° :=El(r c )^f[ %([/*, E a ,n a )t;(q a )f, (7.18) 

V a=l 

where the summation is over all p-tuple of label sequences v — {v},v?, . . . ,£ p ) <G A(q, n) := []^ =1 [Q^l ™ ■ 
The number of different i/'s is bounded by |A(q, n)| < (4p) n . 

7.2 Strategy of the proof presented in the simplest example 

In order to motivate the reader before we start the detailed estimates, we show our strategy via the simplest 
case p = 2, 

N ,2 



^Ei(n|5>| 2 = ^Ei(n j^ z lZ] . 



We write out 



thus we have 



Zi — 2_^ ^kl [hikhli — dkl&ik]} Zj — 2_^ Gmh[hjmh n j — SmnCTj 






-, | W ,2 1 

^El(r c )|^Z,| =^El(r c ) Yl Ginhikhu-6uof k }G$ n [h jm h nj -d mn a? m ]. (7.19) 

i=l ijklrnn 

With the general notation a = 1,2 and the six indices in the summation are organized into a 3x2 matrix 
with columns qi = (q\,qf,qf) and q 2 = {q\,ql,ql), i-e. 



a\ 


q\ \ 


I l 


j 


ll 


ql )- 


\ k 


id 


il 


4 


\i 


n 



The only restriction for these indices is that the top element of each column is distinct from the other two 
below. The sets Q\ = {(i, k), (k, i), (i,l), (l,i)} and Q2 = {(j, tti), (ttt., j), (j', tt,), (n, j)} contain the labels of 
the h factors that explicitly appear in Zi and Zj, respectively. 

Now we expand G?W = G^ q ^ in the variables h v labelled by v <E Qi. We thus decompose the minor 
J{(ii) — H"! 1 ! + E/' 1 ', where the matrix U^ contains only four non-zero entries hj m , h m j, hj n and h n j 
with labels from Q2, and i/M contains all other entries of H^ qi K The resolvent G^ = (H"M — z) _1 is now 
independent of all expansion variables h v with v e Q = Q± U Qi. Note, however, that this decomposition 
depends on q, i.e. it will be different for each summand in (7.19). Since C/' 1 ' is small, we can expand 

G d) = G {<A) = G [i] _ G [i] C/ (i> G [i] + G^U^G^U^G^ ..., 
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and a similar expansion holds for G^> = G^ 2 -*. 

We insert these expansions into (7.19) and organize the terms according to their number of the explicit 
h factors. Effectively, each h factor has a size iV -1 / 2 (neglecting logarithmic corrections). The centered 
random variable £(q) = h q i q 2h q 3 q i — 5 q 2 q 3a 2 x 2 has size AT -1 and the subtracted expectation 6 q 2 q 3(Xi 2 is 
treated on the same footing as hh for the purpose of power counting. 

Typically we need to show that terms with less than eight h factors have zero expectation to compensate 
for the sixfold summation of order N 6 with the prcfactor N~ 2 in (7.19). Depending on certain coincidences 
among the summation indices, sometimes terms with less than eight h factors already give non-zero contri- 
bution, but then the combinatorial factor from the summation is smaller. Furthermore, we want to bookkeep 
the number of off-diagonal matrix elements since the final estimate is in terms of a power of A . 

The leading term in (7.19), 



G ki [hikhu - Skicrfk] G m« [hjmhnj - S mn a 2 m ] , (7.20) 

has four h factors but its expectation vanishes unless at least two summation indices in (7.19) coincide, 
so the sixfold summation is effectively only fourfold. Here the key observation is that if at least one h 
factor appears linearly in the expansion, then the expectation is zero. However, since the quadratic factor 
?(li) = \p-ikhii — fikiVik] nas zero expectation, it is not sufficient to set k = I and m = n to get a non-zero 
contribution; there must be coincidences between the h factors in \hi k hu — S k ia 2 k ] and in \hj m h n j — <5 m nCr 2 m l . 
For example the case i = j, k = m, I = n yields a nonzero contribution, i.e. the summation is only 
threefold. Moreover, if both resolvent elements in (7.20) are off-diagonal, then we get an estimate of order 
N~ 2 N 3 (N~ 1 / 2 ) 4 A 2 = A^N^ 1 . If one of the resolvent elements is diagonal, say k = I, then the other one 
has to be diagonal as well, m = n, otherwise the expectation is zero. This forces one more coincidence, i.e. 
cither i = j and k = I = rn = n or i = m = n. j = k = I. In both cases the summation in (7.19) gives only 
iV 2 and the total estimate is of order 7V~ 2 . 

The next order terms in the expansion are of the form 



G [i] [/ <D G [i]] [ hikhh _ 8 kl a 2 k ]G [ ^ n [h om h r , 



L ]m"nj u mn u jmJ 



- Z2 G ka U ab G bl [hikhi ~ S k l(jf k ]G mn [h jm h nJ - 5mnCT 2 m ] 
a,b<£Q 2 

with five h factors. Notice that two new summation indices, a, b, have appeared, but their combinatorics is 
of order one and not of order N 2 . In fact, U^ b is just one of hj m , h n j or their transposes. Again, there 
should be at least three coincidences among the indices i,j,k,l,m,n to avoid that at least one h variable 
appears linearly or that at least one of the quadratic factors C(li)i £(^2) remains isolated leading to zero 
expectation. It is again easy to see that we collect at least A 2 (in fact, typically A^) unless at least one 
additional index coincides. 

The terms with six h factors are either of the form 



( G [%<i> G [iA [h ik hu-6 kl <rl}(GWU(*)Gm) [h jm h 

V / kl V / mn 

or of the form 



G [i] [/ <i) G [i] t /<D G [i]) [ h . khl . _ 5 kl a 2 k ]G l $ n [h jm h nj - S mn a 2 



JM L " L -- -- m " u J"™J 



In both cases at least two h factor appears linearly, yielding zero expectation, unless there are two coincidences 
among i,j,k,l,m,n. Thus the summation in (7.19) is effectively reduced from N 6 to iV 4 . Since h 6 ~ 

5G 



(iV~ 1//2 ) 6 = N~ 3 , we obtain that (7.19) is of order N _1 . Moreover, in all cases there are at least two 
offdiagonal resolvent elements, unless an additional coincidence occurs. Thus the estimate is N~ 1 (A 2 ,+ TV -1 ). 
The seventh order terms can be dealt with similarly. 

The lowest order non-zero terms with distinct i,j,k,l,m,n indices have eight h factors and they are of 
the form 



( G [i] f/ (i> G [i] [/ (i> G [i]) [ hikhl% _ Skl af k ] (GmuWGWUWGm) [h jm h n] - S mn a 



2 1 



We now have four [/-factors, so they can ensure that all variables hik, hu, hj m , h n j appear quadratically to 
prevent zero expectation. For example, the term 



G [ kLh mj Gf]h jn G^ [hikhu ~ 6 kl *l] G^ k h kl G^h u G^ [h jm h nj - 6 mn <r 2 m ] . 

has non-zero expectation. Moreover, there are four resolvents in offdiagonal form, unless there is an index 
coincidence, so the size of this term is N~ 2 N 6 (N~ 1 / 2 ) 8 Aq = A 4 . 

The mechanism to estimate the term (7.18) for general p is the same, but the bookkeeping is more tedious. 
We will have to estimate the size of each non- vanishing term as powers of N and A 2 ,. 

The power counting in N is relatively straightforward. It is easy to see that if all indices in the matrix q 
are distinct, then at least 2p new h factors must come from the Vq factors to ensure that none of the h factors 
in Y[ a £(c|a) appears linearly (otherwise the expectation would be zero). Thus the total number of h factors 
is at least 4p and their size is estimated by (A^ _1 / 2 ) 4p = N~ 2p . Together with the N~ p prefactor in (7.3), 
this will compensate for the N 3p combinatorial factor coming from the summation over all 3 x q matrices. 
If some indices in q coincided, then the corresponding h factors could appear with a higher multiplicity in 
ria^l")' so their expectation would not necessarily vanish even without an additional h factor from Vq. 
Each coincidence in q reduces the number of necessary h factors from Vq at most by two, hence keeping the 
overall balance of A'-powers. 

The power counting in A is more complicated and it is related to the fact that the expectation of each £ 
is zero. This means that an index coincidence of the form q 2 a = q\ does not imply non-vanishing expectation 
yet. The requirement of nonzero expection either forces coincidences of indices among h factors in different 
£ terms, but then typically two indices have to match, so we gain an additional AT -1 ; or it forces matching h 
factors in the £-terms with [/-factors in the expansion (7.8). The latter implies, however, that instead of a 
single resolvent G^ we consider a longer expansion of the form G^U^G^ . . . which typically has at least 
two off-diagonal resolvents instead of only one. These two scenarios yield an additional factor (A 2 + A^ 1 ) 
for each ^-factor. This gives (A 2 + A^ _1 ) p as a final estimate. 

In the next section we give the precise details of this strategy. 

7.3 Detailed proof of Lemma 4.1. 

The proof will be divided into three parts. The first part is a technical preparation to deal with the very 
small probability event represented by the set T, where either h or a resolvent is too large. It can be skipped 
at the first reading. In the second part we organize the expansion by encoding the coincidence structure of 
various terms by a graph. Finally, in the third part we estimate the size of each term with the help of the 
graphical representation. 
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7.3.1 Cutoff of small probability events 

Since |%| < (logTV)^ 10 ^- 1 ^ in the set T c and (7.12) also holds in T c , we clearly have 



|l(r c )e(qa)Kj(A£ a ,£ a ,« Q )|<C 



C(logTV) 



L/10 



N 



(logiV)Vio 



N 



Hence wc have 



\K\ < (Cp) n 



(logN) L / 10 



N 



/ (log N) l / w \p 



V N 



where [Cp) n is the combinatorics of the summation over v in (7.18). Thus we have 



q q n=0 |n|=n n=0 \n\=n 



NP 



(logiV) L / 10 



N 



(7.21) 



(7.22) 



(7.23) 



where we used that the summation over all q yields a factor N 3p . Since the number of n = (m, ri2, • • • , n p ) 

with |n| = n is bounded by 2 n+p , the last term is bounded by 



(CN (log AO L/10 ) P £ 



Cp{\ogN) L ' w 



N 



(7.24) 



Since p < (\ogN) L / 10 and L < log N/ log log N, the sum of the tail terms with n > 6p is bounded by 
CN~ 5p / 2 , for sufficiently large N, hence for the bound (4.5) we only have to estimate terms with n < 6p. 

We denote all independent random variables by h = (h v ) and split them according to the set Q (see (7.6)), 
i.e., we will write h = (hi,li2) with h2 = (h v : v G Q) and hi = (h v '■ v $ Q). Denote the corresponding 
projection by ttj, j = 1,2, i.e. 7Tjh = h,. Define 

(r c ) i: =^(r c ), Y c := Y[{hu:\hu\<OogN)^ 10 \<r v \}cC Q , Y := C Q \ Y c . (7.25) 

By definition, G^' depends only on variables hi. Furthermore, for any hi € (r c )i there exists h 2 such that 
h = (hi,h 2 ) € T c , in particular, the estimates (7.12) hold for any hi € (r c )i. By definition of T c , we have 

r c c (r c )i x y c (7.26) 

and (7.21) holds in the set (r c )i x Y c . From the resolvent expansion, we have for i ^ j, and for hi € (r c )i, 



G& ] (hi) = Ggl(h) = (H^ U^ - *)t* = J2 



n Q =0 



(Q( a )lj{ a )y i «G(a) 



G 



(a) 



£ 



fQ(a)jj{a)\n a Q{a) 



Using 



i(r c ) = i((r c )i) - i (r c )i x y c \ r c - i((r c )i)i(F), 



(7.27) 
(7.28) 

(7.29) 



r„s 



we can rewrite $" as 



fc- :=<&- + jr- 1+ X- a (7.30) 

$ n •= V $" 

VeA(q,n) 

^, v :=Kl((ni)f[V«(li a ,!f,n a )Z(q a ) (7.31) 



X-! := -El((r c )!)l(r)^ J] V q ( £ °,i/\ra°)£(q Q ) (7.32) 

f a=l 

X-3 := -El((r c ) x x F C \T C ) 53 n VqQAz/*,^)^). (7.33) 



IV a=l 

Analogously to (7.21)-(7.23), we can bound X" 2 as follows 

6/.) 

Np 



h E E E l^q, 2 l < (C P ) 6p (iV(logiV) L / 10 )^P(r) < Cexp [ - c(logAT)^] , (7.34) 

q »=0|n|=n 



using the fact that the estimate (7.21) holds even on (r c )i x Y c since all G^ appearing in Vq depend only 
on {h v : v g Q}. In the last step we used (4.3), n < Qp < 6 (log N)^' 2 < 6(log7V) L/1 °. For the other error 
term wc have 

J^J2J2J2 l^il ^ (C P ) ep (N(\ogN) L / 10 r exp [-c(logNf L ] < C exp [ - c(log Nf L ] . (7.35) 

q n=0 |n|=n 

Here we have used that for a sufficiently large L, the integration of h 2 over the set Y, i.e. an 0(p)- 
moment of the random variables h„, v e Q, in the regime where \h u \ > (logN) L / 10 cr 1/ , is bounded by 
Cexp [ — c(log N)^' L ~\ with some positive ip, depending on $ due to the subexponential decay (2.17) and due 
to the fact that p < (\ogN)^ L ~ 2 . In the estimate (7.35) we also used that (7.12) holds on (r c )i to estimate 
the G^ a > factors remaining from the Vq terms after integrating out the random variables /i„, v G Q. 
Collecting the estimates from (7.30), (7.34) and (7.35), we have 

1 1 6p 

JV?E*^Jv?EE E |^|+ Cexp [-c(logA0n (7-36) 

q q n=0|n|=n 

The last error term can be absorbed into the N~ p term in (4.5) using that p < (log7V)^ L ~ 2 . Hence we only 
have to estimate the contribution of $". The key observation is that 

E hi l((T c ) 1 )aq a )=0 (7.37) 

for any a = 1,2, ... ,p and for any v £ Q. Furthermore, any resolvent G^ a ' appearing explicitly in 

f[ V^ a ,^,n a ) = fl(- 1 ) naG[ ^G [ ;h^...h <a G [ ^ a+i (7.38) 



a— 1 a— 1 



.~)!> 



is independent of any h v , v £ Q. Therefore the expectation in (7.31) is nonzero only if for each v <G Q, either 
h v (or its transpose h v t) appears explicitly in (7.38) or h v (or its transpose h v t) appears in two different £(q Q ) 
factors in (7.31). The first scenario imposes restrictions on the indices of the two resolvents G^ neighboring 
h v in (7.38) and we will infer that some of these resolvents must be off-diagonal that can be estimated by A D . 
The second scenario restricts the total combinatorics of the summation over the q indices in (7.36), which 
gain can also be expressed as a power of N~ x / 2 . In the next step we set up a graphical representation to 
effectively bookkeep all possible situations. 

7.3.2 Combinatorics 

Recall that q is a 3 x p matrix with 2>p slots. The estimate of $" defined in the previous section depends 
on the structure of the indices q = (<£?,), more precisely, it depends on which of the indices q 3 a coincide. The 
relevant structure of these coincidences will be encoded by a graph, S(q), to be defined below. Roughly 
speaking (with some modifications specified below), the vertex set of S(q) will be the set of possible slots of 
the matrix q; two vertices (j, a) and («,/3) are connected by an edge if the corresponding indices coincide, 
la — Qp- Then the summation over q in the right side of (7.36) will be performed in two steps: first we sum 
over all possible graphs, then we sum over all possible q's compatible with this graph, i.e. we write 

E = E E ■ ( 7 - 39 ) 

1 G q:S(q) = G 

where the first summation is over all graphs with at most 3p vertices. In fact, only certain special graphs G 
will be compatible with a choice of indices q that occur in our expansion and their number will be bounded 
by p c P. 

The reason for this resummation is that the size of $" is essentially given by the number of off-diagonal 
resolvents in the expansion (7.31), but considering only those terms which are not zero due to the expectation 
(see (7.51) below). This number can be estimated via the coincidence graph. 

We now define the graph S(q), describing the relevant coincidence structure of q, by performing the 
following four-step procedure. Strictly speaking, the graph is defined on a subset of the 3p vertices (or slots 
in the matrix) labelled by coordinates (J, a) with 1 < j < 3 and 1 < a < p. We will say that a vertex 
(j, a) has the value r if q> a — r, in other words, the index q J a assigned to the vertex (j, a) will be sometimes 
also referred to as the value of that vertex. If it does not lead to confusion, we will often simply refer to q J a 
instead of the vertex (j,a), e.g. we will say that two indices, q 3 a and q\ are connected by an edge, meaning 
that the vertices (j, a) and (i,/3) are connected. 

Let £(q) denote the number of different location indices, i.e., 

£ = £(q):=\{q 1 a :l<a<p}l (7.40) 

where | • | denotes the cardinality of the set, disregarding multiplicity. We group together all columns with 
the same location indices; the union of these columns will be called group. Let mi,m2, ■ ■ • rri£ denote the 
multiplicity of the groups, i.e., the number of columns with the same location indices. We clearly have 

i 
^m s =p. (7.41) 

8=1 

We start with the matrix q and perform the following operations to obtain S(q). In Step 1 and 2 we specify 
the vertex-set of S(q) by removing some of the original 3p vertices. Step 3 and 4 specify the edges of S(q)- 
After each step we give an intutive explanation. 
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Step 1. If q\ = q^, we replace g^ by * and the vertex (3, a) will not be part of the graph 9(q). In the 
matrix, we put a * in its location. We now call q\ a duplex and put a subscript d to indicate it. 

Explanation: If q 2 a = q^ then the two h factors in £(q a ) are the same. This coincidence has to be 
treated separately, since it does not automatically lead to non-zero expectation due to E£(q Q ) = 0. It 
will thus be easier to merge the vertices (2, a) and (3, a) into one vertex. 

Step 2. For a ^= f3 and any i, j € {2, 3} we call the vertices (j, a) and (i, /?) (and the corresponding indices 
q 3 a and q l „ ) twin if q 3 a = q\ and q% = q^. We now replace q 3 a and q\ by t to indicate a twin but we do 
not make any change on location index. Vertices with t will not be part of the graph 9(q)- Notice that 
by the restriction (7.5), q J a ^ q\ and thus q\ ^ ql, i.e., twins can only be formed in different groups, 
i.e. in columns with different location indices. 

Explanation: This is the situation where there is a coincidence among the h factors in two different 

£(<]«) = Kl,qlKl,<ll ~ S <&,<&°q*,qi and ^z 3 ) = \\,q%\%,q\ ~ S l}>ll°%,q^ a ^ Pi 

e.g. q\ = q\ and q\ = q\- Such coincidence results in nonzero expectation with respect to h q i q 2 without 
forcing h q i „2 to also appear somewhere in the resolvent expansions, i.e. in one of the Vq factors in 
(7.38). This means that h q i q z may not generate an additional off-diagonal resolvent element. We 
will remove such vertices from the graph to allow a more uniform treatment for the rest and we will 
account for the twins separately. 

Step 3. Two vertices are connected by an edge in 9(q) if the indices assigned to them are the same, except 
if both vertices are in the first row of the matrix. I.e., edges connect vertices with identical indices, 
except that there is no edge between any two location indices. 

Explanation. Since the location index plays a different role than the two non-location indices, their 
possible coincidence have separately been taken into account by the concept of groups. 

Step 4. We add an edge between a duplex (q^)d and its location index q\ if the multiplicity of the group 
that the duplex belongs to is one, i.e. if the duplex is isolated. 

Explanation. This is a purely technical convenience. Later we will consider connected components of 
9(q). Isolated duplex will be treated separately (see Case 1. below in the proof of Proposition 7.1), but 
artificially making the two vertices of a duplex into one connected component will allow us to simplify 
the argument of Lemma 7.2. 

We remark that the number of different graphs arising in via this procedure is bounded by p Cp . This 
is because 9(q) has the following special structure. Its vertices are partitioned into equivalence classes 
(according to the common value of their indices) and any two vertices within an equivalence class are 
connected by an edge, unless they are both location vertices. The number of partitions of the vertices is 
at most p Cp . Furthermore, there are additional edges between duplexes and their location vertices if the 
corresponding location index appears only once in q, but the possible combinatorics of these additional edges 
is at most a factor of 2 P . 

Having defined 9(q), the next step is to assign a weight to all vertices as follows. 

Definition 7.1 (Weight of vertices and groups in 9(q)) (i) In a group with multiplicity m s = \ 
each vertex has weight zero. 



(if 



(ii) In a group with multiplicity m s > 1 we assign a weight 1 to each duplex in the group; all other non- 
location vertices in the group will have a weight 1/2. 

(Hi) The total weight of a group is the sum of weights of its vertices. 

(iv) The total weight W = W(c\) of the graph is the sum of the weights of all vertices. 

Clearly, the total weight of each group is at most m s < 2(m s — 1). Thus the total weight of the graph 

satisfies, by (7.41), 

e 

W <^22(m s -l) =2(p-£). (7.42) 

s=l 

If all location indices are distinct, then all weights are zero. In this case, each nonlocation index in 
9(q) forces a new h term in V^j, see (7.38); note that this statement used that twins are taken out of the 
graph. If some location indices coincide, i.e. we have a group with multiplicity larger than one, then the 
possible coincidences of non- location indices within the group may yield non-zero expectation without forcing 
a corresponding h factor in V^. This may shorten the expansion (7.38), hence reduce the total number of 
off-diagonal elements. The weight measures the maximal reduction of off-diagonal elements in (7.38) due to 
the larger multiplicity, compared with the multiplicity one case. 

Definition 7.2 (Independent nonlocation indices) Denote by Ni nd the number of different nonlocation 
indices that do not coincide with any location index i.e., 



Ni„d = N ind (q) :-- 



{qi--2<j<3,l<a<p}\{q l a :l<a<p} 



(7.43) 



where again \ ■ \ denotes the cardinality of the set, disregarding multiplicity. The elements of this set will be 
called independent nonlocation indices. 

Note that N ind gives the actual number of different q 2 a and q% in the second sum in the right hand side 
of (7.39). Together with the number of groups (., i.e. the number of different location indices, the number 
of terms in the ^2 summation will be bounded by N Nind+l . 

We show an example to illustrate this procedure and definitions. Let p — 13 and 

1 23344555 567 8 

10 1297 15 999 924 14 (7.44) 

10 11 5 6 7 12 9 9 13 13 2 12 14 

Then after the first step, we get 

23344555567 8 

I 2 9 7 d 15 9 d % 9 9 2 d 4 14 d (7.45) 

II 5 6 * 12 * * 13 13 * 12 * 




After the second step we have 




23344555567 8 
1 2 9 t 15 9 d 9 d 9 9 2 d t 14 d (7.46) 

11 5 6 * 12 * * 13 13 * 12 * 
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In this example, the graph S(q) will have 31 vertices, identified with the slots of the matrix in (7.46) 
that contain numbers. The slots with stars and i's do not count as vertex of S(q)- The different location 
indices are 1,2,3,4,5,6,7,8 and the different non-location indices arc 9,10,11,12,13,14,15, thus £ = 8, 
JVj„d = 7. The multiplicity of the groups with different location indices are mi = rri2 = m§ = to 7 = rag, = 1, 
to 3 = to 4 = 2, m 5 = 4. For simplicity, in this example, we chose 1, 2, 3, 4, 5, 6, 7, 8 to be the eight different 
location indices and we used them to label the groups as well. We also used the consecutive seven numbers 
for non-location indices. In general, both the location and non-location indices can be arbitrary numbers 
between 1 and N. 

For brevity, we will often use the index associated to a vertex to refer to a vertex, e.g., when we refer to 
the index 2d in (7.46), we really mean the vertex (2,11) since q\ x = 2d- This sometimes creates confusion 
(e.g., there are two vertices 9) and in that case, we will be specific. 

All vertices with identical indices are connected by an edge, except that there is never an edge between 
any two vertices in the first row. Furthermore, there is an edge between 2^ and 6 (more precisely, between 
the vertices (2, 11) and (1, 11)); similarly for 14^ and 8, but there is no edge between the non-location indices 
9<f and their location indices 5 since they belong to a group with multiplicity bigger than one (four) due 
to the four location indices 5. The vertices with 2,5,9,6 (with common location index 3) the vertices with 
12, 15 (with location index 4) and the two 9's and 13's (with common location index 5) all receive a weight 
1/2. The weight of both 9d's is 1 and all other vertices have weight zero. Notice that the index pair (5, 9) 
appears twice but they are not twins (there are no twins inside a group), similarly the two (5,9^) are not 
twin indices. 

We will consider connected components of this graph. Due to the special rule involving duplexes, a 
connected component may contain different indices, for example 

C = {(1, 2), (2, 3), (2, 11), (3,4), (1, 11)} (7.47) 

is a connected component in (7.46), since q\ = gf = qf x = 2, g| = q\ x = 6 and q\ x = 6 is connected to 
q\ x = 2d- With as slight abuse of notation, encoding the elements of C only with the indices q l a instead of 
the vertices (i,a) we can write C = {2(loc.),2, 2<j, 6, 6(loc.)}, where (loc.) refers to location index. The list 
of all connected components in (7.46) is 

{l,10 d ,l}, {11}, {2(loc.),2,2 d ,6,6(loc.)}, {3}, {3}, {4}, {4}, {7}; 
{5,5(loc.),5(loc.),5(loc.),5(loc.)}, {15}, {12,12}, {9 d ,9 d ,9,9,9}, {13,13}, {8,14 d }, (7.48) 

using the shorter and somewhat ambiguous index-notation. 
7.3.3 Estimates on the integrals 

We now estimate $" v from (7.31). Let O = 0(q, n, v) be the number of the off-diagonal Green functions 
appearing in the expansion of the right hand side of (7.31), i.e., in 

ny q (^^n«)£(q Q Hn(-l)"«G^ (7.49) 

a— 1 a— 1 

(see (7.16) and (7.17)). Define 

A = max |G[" ] | (7.50) 

a— l,...p; iy^j 



(>:->> 



to be the maximum of the off-diagonal elements of the Green functions G^ . Note that A is independent of 
the random variables h v , v £ Q. In particular, the bound A < C/(log A^) 2 < 1 from (7.12) holds not only 
on T c but on (r c )i as well. Then, with O = 0(q, n, u), and using n < 6p, we have 



q| 



Ei((r c )!)^ H v((jr, E <*,n a )a<\*) < ^v-" /2 - p (Cp) c ^E[i((r c ) 1 )(A ) ], (7.51) 



V Q=l 



where for the expectation of the random variables h v , v £ Q, we have used estimate of the form 

E|/ii| ai ...|/i fe | a * < (Cm c N- 1 / 2 ) m , m:=J2 a J ( 7 - 52 ) 

3 

for any a,j nonnegative integers, where the constant C depends only on •&. The total number of h factors 
appearing in (7.49) is n\ + tii + . . . + n p + 2p = n + 2p, and (7.52) shows that their expectation can 
be bounded in terms of their total number J^ a,j irrespective of the precise distribution of the individual 
exponents 01, tt2, • • • a*. Thus N~ x ' 2 appears to the power n + 2p in (7.51). 

We also recall that the number of terms in the summation over v £ A(q, n) in (7.51) is bounded by 
(4p) n , see remark below (7.18). 

Since we have A < 1 on the set (r c )i, we also have the trivial estimate 

A° <N^°^[A 2 + N- 1 ] P 
where [ ] + denotes the positive part. Thus the main term in (7.36) is estimated as 
1 6p 

Ivy v l$ n l 

_/yp Z_^ Z_^ A^i I q 1 
q n=0 |n|=n 

6p 

<(Cp)^E[l((r c ) 1 )A 2 + AT- 1 ] J 'i: E EE E iV- 2p - n/2+b -° /2l+ l(^^0). (7.53) 

G q:S(q)=Gra=0|n|=nl/eA(q,n) 

From (7.29) we have the decomposition 

i((r c )x) = i(r c ) + i((r c )! x y c \ r c ) + i((r c ) x )i(Y). (7.54) 

Since A < 1 on the set (r c )i, the contributions from the sets (r c )i x F C \T C and (r c )i x Y can be estimated 
in the same way as in (7.34), (7.35) by Cexp [ — c(log N)^ L ] . Finally, we can use 

1° < 2A° 

on the set T c C (r c )j x Y c (sec (7.12)) and thus we can replace E[l((r c )i)A 2 + Af- 1 ] p in (7.53) by 
2PE[l(r c )A 2 , + N^Y with a negligible error Cexp [ - c(log N)^ L ] . 

By (7.40) and Definition 7.2, the total number of different summation indices q in (7.53) is Nind + £■ We 
will prove that 

2p + n + O > 2N md + 21 (7.55) 

and 

4p + n > 2N md + 21 (7.56) 
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hold for any q, n and u for which $" v ^ 0. Since the summations over G, n, n and u give a factor at most 
p Cp . these two inequalities imply that (7.53) is bounded by the right hand side of (4.5). This proves Lemma 
4.1 assuming (7.55) and (7.56). 

We now we prove (7.55) and (7.56). Recalling the total weight of the graph W satisfies W < 2(p — t) by 
(7.42), the inequality (7.55) is a consequence of the following 

Proposition 7.1 For any q, n and v such that $™ v ^ 0, we have 

W(q) + |n| + 0(q, n, v) > 2N md (q). (7.57) 

Proof. We consider connected components C of the graph S(q). If a connected component consists 
of only one location index, we call it trivial, and we will consider only non-trivial components. Nontrivial 
components always contain at least one nonlocation vertex since location indices are never connected directly 
by an edge. We will prove that (7.57) holds for each nontrivial connected components and then we will sum 
these inequalities. 

To formulate the statement precisely, we need a few notations. We will fix q, n and v <G A(q, n); all 
quantities in the following notations will depend on these parameters. 

For each nontrivial connected component C of 9(q), let Ic denote the set of all nonlocation indices 
appearing in C, i.e., 

I c := {?*, :(t\ a) GC, * = 2,3}, (7.58) 

and for the purpose of Ic we do not distinguish between indices with or without a possible d (duplex) 
subscript. Let Lc denote the set of all labels associated with C together with their transposes v f , where 
v* = (q,p) \iv= (p,q), i.e., 

Lc := {{qlO ■■ M) G C}U {(&,&) : (*, a) £ C}. (7.59) 

For example, Lc = {(2, 3), (3, 2), (6, 2), (2, 6), (3, 6), (6, 3)} for the connected component C from (7.47). Let 

P n a 

n(C) = n(C; q, n, u) := £ £ l(i£ G L c ) 

a— 1 m— 1 

be the total number of /i„-factors with v £ Lc appearing in the expansion (7.49) without the h factors from 
Y[ a £(q a )- Finally, we define W(C) = W(C; q) as the total weight of the component C, i.e. the sum of the 
weights of vertices in C. 

The following key quantity will be used to count the number of offdiagonal resolvent matrix elements 
appearing in the expansion. 

Definition 7.3 For a € Ic, let 

P Tl Q + l 

20(a) := J2 E [^bCh = °, Kla ^ °) + 1(K] 2 = a, [ M « ] x * a) 

a— 1 m— 1 

i.e., 20(c) is the number of times that a appears as one of the two indices of an off-diagonal Green function 
in the expansion (7.49). Let 

0(C) = 0(0; q, n, u) := ]T 0(a) (7.60) 

i.e., 20(C) is t/ie number of times that an index associated with C appears in an off-diagonal Green function 
in (7.49). 
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Note that we do not directly count the total number O of off-diagonal resolvent matrix elements, we 
rather count how often a fixed non-location index contributes to an off-diagonal Green function factor. In 
this way we can determine how much each non-location index contributes to off-diagonal matrix elements 
and we can perform our estimates for each component separately. 

By definition of the edges in the graph, two different nontrivial components C\ , C2 have disjoint sets of 
nonlocation indices; Iq 1 fl Ic 2 = 0- As a corollary, the sets Lc for different components are also disjoint 
since the twins are eliminated and for any hxcd q, n and v we have 

Y n{C; q, n, v) < |n|, Y 0(C; q, n, v) < 0(q, n, i/), (7.61) 

c c 

where the summations are over all nontrivial connected components. Strict inequality can happen as there 
are indices left out in twins. Moreover, we define 



N ind (C)=N lnd (C;q) 



{qi--2<3<3,l<a<p, (j,a) € C} \ {<£ : 1 < a < p] 



to be the number of independent nonlocation indices in the component C. This is the same concept as 
^Vmd(q) defined in (7.43) but restricted to a fixed component C. We clearly have 

Y W(C) =W, Y N ind(C] q) = N md (q). (7.62) 

c c 

We will prove below that (7.57) holds in each nontrivial component C, i.e. for <&" v ^ 0, we have 

W(C; q) + n(C; q, n, v) + 0(C; q, n, u) > 2N md {C; q). (7.63) 

then (7.57) will follow from (7.61) and (7.62). 

Lemma 7.2 Let C be a nontrivial connected component o/S(q). Then Ni nd (C) < 1. 

Proof. Suppose that C contains at least two different independent nonlocation indices q° a ^ q\ and 
consider a path in S(q) connecting their vertices W\ = (j, a) and Wi = (i, (3). Along this path there must be 
two subsequent vertices whose indices are different. Considering the construction of S(q), this can happen 
only along an edge created by the special rule in Step 4 in the definition of S(q), i-e. there is a duplex 
connected to its location vertex (any other edge connects identical indices). For definiteness, we may choose 
the notation W\ and W2 in such a way that along the path from W\ to W2 the first special edge created by 
Step 4 with different indices is reached at its non-location vertex (duplex vertex) , call it U\ . Clearly U\ and 
W\ have the same index. Let now E be the edge connecting XJ\ to its location vertex V\ £ C, then by the 
choice of U± the index of V\ differs from that of U\ . Let D be the set of all vertices with the same value as 
U\ and let D x be the set of all vertices with the same value as Vi, then D and Lq are disjoint subsets of C. 

We claim that apart from V\, Lq consists of nonlocation vertices only. Suppose this is not the case. Then 
there is another location vertex V[ taking the same value as V\ . But this implies that V\ and V{ belong to a 
group with multiplicity at least two. In this case, however, we did not connect the duplex V to its location 
vertex and this leads to contradiction. 

The number of independent nonlocation indices in D is exactly one, namely the index of W\ . The number 
of independent nonlocation indices in D\ is zero since they take the same value as a location index. 
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Suppose that flUOi did not exhaust C '. In order that D\ U D is connected to another vertex with a 
different value, once again, there must be an edge E' connecting a duplex vertex to its location vertex; one 
of these two vertices must be D\ UD, the other one must be in the complement. We claim that the duplex is 
in £>i U D. Indeed, the location vertex cannot be in D\ U D, since D has no location vertex at all (otherwise 
the index of U\ would not be independent) and D\ has only one location index, V\, that is already connected 
within D U D\ to its duplex. 

Let U2 denote the duplex inDUDi that is connected to its location index V2 ^ DuD\ and let D2 denote 
the set of vertices with the same value as V2. As before, we can establish that D2 contains only non- location 
indices, apart from V2, and there is no independent nonlocation index in D2. 

If D U D\ U £>2 did not exhaust C, we continue the process by defining new sets D 3 , D4, etc. until C is 
exhausted, but we never get a new independent nonlocation index. This proves that Ni n d(C) < 1. 

□ 
We can start proving (7.63). We fix the parameters q, n and v and omit them from the notation. We 
will distinguish the following cases that clearly cover all possibilities. 

Case 1. C consists of a duplex (q^)d and its location index g*. 

Setting v := (g^g^), we know, in particular, that h v or h v t do not appear in any other £(q,s), ft 7^ a 
since C is an isolated component, not connected to any other vertices. Then, by the observation made 
in (7.37), h v (or h v t) must explicitly appear in (7.38) and it clearly must appear in one of the following 
ways, with some ft =/= a, 

(1): Gf iql h qW dg h , or G [ f ^h qWa G l § j2 , f^q* a (7.64) 

(2): h q Wc, G q{,ql Klili or h il<i 1 a G ql.q 1 a h <iW a - (7.65) 

The main reason why only one of these possibilities occurs is because the indices q % a ,i = 1,2, appear 
only in C. So either (1) both Green functions neighboring h v (or h v ±) are off-diagonal, or (2) either of 
the neighboring Green function is diagonal. In the latter case, however, the expansion must continue 
on the other side of this diagonal Green function with another factor h q i q 2 (or h q 2 „i ) . The reason for 
this last statement is that the expansion cannot start or terminate with a diagonal Green function of 
the form G } 1 or G 2 2 since that would entail that q\ (or g^) equals to q\ or <A, which would mean 
that C contained other elements as well. 

In the first case, n(C) > 1 and we have identified two indices of off-diagonal Green functions associated 
with q 2 a , i.e. 0(C) > 1. In the second case, we find that h v or h„t appear altogether twice and hence 
n(C) > 2. Since Ni n d(C) = 1 in this case, we have thus proved that in both cases 

W(C) + n(C) + 0{C) > 2 = 2N md {C). (7.66) 

Notice that we did not use weight W(C) here. 

Case 2. C is an isolated non-duplex vertex. 

Since C is nontrivial, we can assume that C consists of a single vertex (2, a) (the case of (3, a) is 
identical). Let v := (g„,g„)- Consider the expansion of G^ a \ see (7.16). The first and the last Green 
functions in this expansion will be called extreme Green functions; if n a =0, then the single Green 

function G 2 3 will be called extreme. Since this expansion contains h^ factors only with /1 £ Q^ a ' 

and q\ 7^ q\ for any ft ^ a (since C is an isolated vertex) , thus q\ cannot appear as an index of any 
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hfj,. Then the first Green function in (7.16) must be of the form G " f with some / ^ q^, i.e. it must 
be off-diagonal, thus 0(C) > \. Furthermore, h v or h u t must appear as 

(i): V^ G e/ or G Sa^«i. /^flS ( 7 - 67 ) 

(2): ^i^Gg 1 ^^,! or h^G^h^ (7.68) 

In the first case (1), we have identified another index of off-diagonal Green function associated with 
<j£, so 0(C) > 1 and n(C) > 1. In the second case (2), we find that h v and h„* appears altogether 
twice and thus n(C) > 2. In both cases we have proved (7.63) since N in d(C) = 1. Again, the weight 
W(C) was not used. 

Case 3. C has only one non-location vertex, (i,a), i — 2,3 and at least one location vertex (l,/3) with 

In this case the non-location index q l a is equal to a location index, hence Ni n <i(C) = and (7.63) is 
obvious. 

Case 4. C has more than one non-location vertex. 

Suppose the weight of a non-location vertex (2, a) in C is zero. Then h q i q 2 (or h q 2 q i) must appear 
in (7.49) (apart from the £ factors) and thus it contributes to n(C) by one. Here we are using the 
following reason: 

(f) If h q i q 2 and h q 2 q i appear in 11/3 £(9/3) a ^ least twice, then either (2, a) is a twin vertex or the 
multiplicity of the group containing (2, a) is more than one. 

Both cases contradict our definitions; twins are not part of S(q), and non-location vertices in groups 
with higher multiplicity have nonzero weight. But if h q i q 2 and h q 2 q i appear only once in Jlfi^l^) 
(namely, only in the factor £(q Q )), then at least one of them need to appear at least one more times in 
(7.49) to make the expectation nonzero. 

Hence if we have at least two weight zero non-location vertices in C, then n(C) > 2 and (7.63) holds. 
Note that each of these two vertices contribute to n{C) by one, since together with their own location 
vertex they must form two different labels, otherwise they would be part of a twin or a group with 
multiplicity at least 1 and their weight would not be zero. We can also assume that the total weight 
W(C) is less than 2 or, if there is a weight zero non-location vertex, hence n(C) > 1, then the total 
weight is at most W(C) < 1/2. In all other cases (7.63) follows trivially from Ni n d(C) < 1. 

So we only have to consider the following remaining cases: 

1. The non-location vertices of C consist of exactly two weight 1/2 vertices i>i,i>2. 

First notice that these two vertices must have the same index. Otherwise they could be in the 
same connected component only if one of them, say V2, would be equal to a duplex (<A)d with 
some P y^z a where v\ = q ] a (j <G {2, 3}), and this duplex would belong to a group with multiplicity 
one (a connecting edge between vertices with different indices can be provided only via a special 
edge from Step 4. between a duplex and its location vertex and only if the corresponding group 
has multiplicity one). But in this case the weight of the non- location vertex (2, /3) in C would be 
zero by (i) of Definition 7.1. 
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Thus the two vertices V\ , v-i cannot be in the same column of the matrix (otherwise they formed 
a duplex), so without loss of generality we can assume that they are of the form (2, a) and (2, /3) 
with a ^ (3 and we know that q 2 a = <&■ 

Consider first the case q^ ^ q\. By the fact that the common value q\ = q\ appears only twice 
in C, both factors h q \ q 2 and h q i q 2 (or their transposes) have to appear in (7.49). Thus n(C) > 2 
and (7.63) holds. 
Finally, consider the case q^ = q\. Since q 2 a and q\ have weight 1/2, they are not duplex. By 

construction, we have to expand the Green function G'° 3 . Since q 2 a ^ q"^,, in the expansion 

(7.49), the first Green function G"l is off-diagonal (otherwise the beginning of the expansion were 

G 2 2 h q 2 q i . . ., but h q 2 q i cannot appear in the expansion of G ' 3 ). Hence q 2 a appears as an 

index of an extreme off-diagonal Green function. Similar statement holds for q\ . Hence we have 
identified two indices of off-diagonal Green functions associated with C so that 0(C) > 1 and 
together with W(C) > 1 we obtain that (7.63) holds. 

2. The non-location vertices of C consist of exactly one weight 1/2 vertex, and one weight 1 vertex. 
Since the weight 1 vertex is a duplex, these two vertices cannot be in the same column of q. 
Without loss of generality, let (2, a) be the weight 1/2 vertex and let (2,/?)^ be the weight 1 
vertex, a /= (3. We can consider two cases: q\ ^ ql and q\ = q^. As before, for the first case, 
n {C) > 1. For the second case, q\ cannot appear as an index of any h v in any other £(q 7 ) for 
7 7^ a, /3 since C consist of exactly two columns, namely the columns a and f3. Thus h q i q 2 or its 
transpose must appear in the expansion of G^ a ' and therefore we can find q\ as one of the indices 
of an extreme off-diagonal Green function. Hence we have 0{C) > 1/2 in the second case. Since 
W(C) > 3/2, we obtain in both cases that (7.63) holds. 

3. The non-location vertices of C consist of exactly one weight 1/2 vertex one weight zero vertex. 
Since the two vertices have different weights, they are in different columns of the matrix. Without 
loss of generality, we can assume that the weight 1/2 vertex is (2, a) and the weight zero vertex 
is (2,/3) with a / f3. In this case, both h q i q 2 and h q i q 2 (or their transposes) have to appear in 

the expansion, thus n(C) > 2 and (7.63) holds. 

4. The non-location vertices of C consist of exactly three weight 1/2 vertices. 

Similar arguments as in the first case, we can show that these three vertices are in different 
columns and we can thus assume that they are of the form (2, a), (2, f3) and (2, 7) with different 
a, (3, 7. If g* = q\ = g^, then q\ appears as an index of an extreme off-diagonal Green function in 

(a 1 ) 

the expansion of G 2 a 3 and 0(C) > 1/2. On the other hand, if one of the three location indices, 
say q^, differed from the other two, then h q i q 2 (or its transpose) have to appear in the expansion 
and n(C) > 1. In cither case, together with W(C) > 3/2, we obtain (7.63). 

The main reason of the previous proof is that any weight 1/2 vertex either associated with an index 
of an extreme off-diagonal Green function or there is an h factor associated with it. We have thus proved 
Proposition 7.1 rj 

Finally, we have to prove the inequality (7.56). Let d denote the number of duplexes. Let a\ be the 
number of nontrivial components C that contain only one non-location vertex and let 02 be the number 
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of nontrivial components C that contain at least two nonlocation vertices. Since by Lemma 7.2 we have 
Ni n d = ^2 C Ni n d(C) < a\ + a,2 and obviously I < p, it is sufficient to show that 

2p + n > 2(oi+a 2 ). 

Since we there are 2p — d nonlocation vertices, we have 2p — d > a\ + 2a-2, thus it is sufficient to show that 
n + d > oi. But each component with a single non-location vertex, say (2, a), is either a duplex or it gives 
rise to a factor h q i q i (or its transpose) that must appear in the expansion, hence it contributes to n. This 
shows (7.56) and this completes the proof of Lemma 4.1. rj 
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